![]() |
Regex Pattern
Hi
The aim is to identify anomalous strings in user data using a Regex pattern but as a complete newbie I have got bogged down. My test code is below but as you will see there are some "bad" results. The criteria - a string can be: text only = Head, Dept Head text plus numerics = Class 8-10, Class 8, 6th form, Head KS6 a single dot If numerics are included then: a single or double digit must be followed by a hyphen or an ordinal = 1st, 10-12, 15th there must be no lead zeros = 0 or 01270 ordinals must not be hyphenated = 2nd-3rd, 2nd-4 My Regex doesn't yet include a test for text only and wondered if that had to be a separate process? I would be grateful of any help to complete the pattern. T.I.A. Geoff Sub TestPattern() Dim objRegExp As Object Dim i As Long '''some test and result strings Dim arrString() As Variant Dim arrResult() As Variant arrString = Array("String", "0", "class 1-", "8-", "8-10", _ "12", "12-15", "165", "12-165", "0 12", "Class 8", "1st class", _ "2nd class", "3rd class", "6th form", "40th class", "400th class", _ "Head", "Head KS6", ".") arrResult = Array("Result", "ok", "ok", "ok", "ok", "ok", "ok", _ "ok", "bad", "bad", "bad", "ok", "ok", "ok", "ok", "ok", "ok", _ "bad", "bad", "bad") '''set pattern Set objRegExp = CreateObject("Vbscript.RegExp") With objRegExp .Global = True .IgnoreCase = True .MultiLine = True .Pattern = "\b[1-9](-|st|nd|rd|th)\b|\b[1-9][0-9](|-|st|nd|rd|th)\b" End With With Sheets(1) '''setup test strings .Range("a:c").ClearContents .Range("a:c").NumberFormat = "@" .Cells(1, "B") = "Regex Test" For i = LBound(arrString) To UBound(arrString) .Cells(i + 1, "A") = arrString(i) .Cells(i + 1, "C") = arrResult(i) Next '''run test pattern For i = LBound(arrString) + 1 To UBound(arrString) If objRegExp.Test(.Cells(i + 1, 1)) Then .Cells(i + 1, "B") = "valid" Else .Cells(i + 1, "B") = "invalid" End If Next End With End Sub |
Regex Pattern
you must use in excel application?
Your example is not clear to me you can make a list of cases ok and a list of cases bad with poor reasoning in the list example set all your text between "" I'm sorry for my english regards r Il mio ultimo lavoro ... http://excelvba.altervista.org/blog/...ternative.html "Geoff K" wrote: Hi The aim is to identify anomalous strings in user data using a Regex pattern but as a complete newbie I have got bogged down. My test code is below but as you will see there are some "bad" results. The criteria - a string can be: text only = Head, Dept Head text plus numerics = Class 8-10, Class 8, 6th form, Head KS6 a single dot If numerics are included then: a single or double digit must be followed by a hyphen or an ordinal = 1st, 10-12, 15th there must be no lead zeros = 0 or 01270 ordinals must not be hyphenated = 2nd-3rd, 2nd-4 My Regex doesn't yet include a test for text only and wondered if that had to be a separate process? I would be grateful of any help to complete the pattern. T.I.A. Geoff Sub TestPattern() Dim objRegExp As Object Dim i As Long '''some test and result strings Dim arrString() As Variant Dim arrResult() As Variant arrString = Array("String", "0", "class 1-", "8-", "8-10", _ "12", "12-15", "165", "12-165", "0 12", "Class 8", "1st class", _ "2nd class", "3rd class", "6th form", "40th class", "400th class", _ "Head", "Head KS6", ".") arrResult = Array("Result", "ok", "ok", "ok", "ok", "ok", "ok", _ "ok", "bad", "bad", "bad", "ok", "ok", "ok", "ok", "ok", "ok", _ "bad", "bad", "bad") '''set pattern Set objRegExp = CreateObject("Vbscript.RegExp") With objRegExp .Global = True .IgnoreCase = True .MultiLine = True .Pattern = "\b[1-9](-|st|nd|rd|th)\b|\b[1-9][0-9](|-|st|nd|rd|th)\b" End With With Sheets(1) '''setup test strings .Range("a:c").ClearContents .Range("a:c").NumberFormat = "@" .Cells(1, "B") = "Regex Test" For i = LBound(arrString) To UBound(arrString) .Cells(i + 1, "A") = arrString(i) .Cells(i + 1, "C") = arrResult(i) Next '''run test pattern For i = LBound(arrString) + 1 To UBound(arrString) If objRegExp.Test(.Cells(i + 1, 1)) Then .Cells(i + 1, "B") = "valid" Else .Cells(i + 1, "B") = "invalid" End If Next End With End Sub |
Regex Pattern
On Tue, 9 Jun 2009 06:24:01 -0700, Geoff K
wrote: Hi The aim is to identify anomalous strings in user data using a Regex pattern but as a complete newbie I have got bogged down. My test code is below but as you will see there are some "bad" results. The criteria - a string can be: text only = Head, Dept Head text plus numerics = Class 8-10, Class 8, 6th form, Head KS6 a single dot If numerics are included then: a single or double digit must be followed by a hyphen or an ordinal = 1st, 10-12, 15th there must be no lead zeros = 0 or 01270 ordinals must not be hyphenated = 2nd-3rd, 2nd-4 My Regex doesn't yet include a test for text only and wondered if that had to be a separate process? I would be grateful of any help to complete the pattern. T.I.A. Geoff It shouldn't be that hard to do, but your criteria are not congruent with your examples. For example, you give as an example in the text plus numerics category "Class 8" and "Head KS6". But your criteria state that if numerics are included, a single or double digit must be followed by a hyphen or an ordinal. That is not present in these examples. You also give "Class 8-10" as an example. But the 10 is not followed by a hyphen or an ordinal. In addition, are these single entries within a cell; or are there multiple entries within a cell as shown on the various lines you have provided? --ron |
Regex Pattern
On Wed, 10 Jun 2009 03:51:01 -0700, Geoff K
wrote: Ron In running through the databases with the new pattern I now realise the significance of KS3, KS4, KS5 etc. KS is an acroynym for Key Stage and this appears quite frequently in the lists. It would be very useful to "allow" this through and I wondered if you would be kind enough to amend the pattern accordingly. It can take the form of KS3 Manager, Manager KS4 or expanded to Key Stage 5 Manager or Manager Key Stage 6. Is this possible? Re speed, I don't have a problem. Thank you so far. Geoff The entries where the string ended with a digit should have been being flagged as OK. I don't have time to work on this for the next few days, but if your valid phrases are all as above, you could just add, via alternation, to the existing regex. e.g. (for one digit KS entries): (^|[^-])[1-9]\d?(st|nd|rd|th)(?!-)|((^|\D)[1-9]\d?-[1-9]\d?(?=\D|$))|(^(\D+|\.)$)|([a-z]\s?)[1-9]\d?$|KS[1-9]|Stage\s+[1-9] (for two digit KS entries, just add the \d? after each [1-9] as in KS[1-9]\d? --ron |
Regex Pattern
Ron
Great. Thank you for the help and example. That is excellent advice. I can progress from there - i hope <g Geoff |
Regex Pattern
On Wed, 10 Jun 2009 07:13:04 -0700, Geoff K
wrote: Ron Great. Thank you for the help and example. That is excellent advice. I can progress from there - i hope <g Geoff Glad to help. Thanks for the feedback. --ron |
All times are GMT +1. The time now is 04:40 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com