Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Pattern match and grouped items
Dear all
I have in : -- Column A, a list of items say, 1 YORK A/COND 2 SANYO 1.0HP air-cond 3 CANON laser printer (model : L-P'TER*LBP-3050) 4 EPSON - Batching system printer 5 EPSON - Dot matrix printer 6 Nokia 1200 - handphone 7 HP printer 8 Panasonic K Printer 9 Panasonic X printer 10 Panasonic P-P1121 printer 11 Nokia N6070 handphone 12 Nokia 1200 handphone 13 Microphone 14 Panasonic printer 15 2nd hand hand phone I am looking for a function / vba / anyway if excel can do a pattern match and : - highlight (with color) items 1,2 and return in column B the word "cond" (i.e. same word contains in both the the items) - highlight (with different color) items 3,4,5,7,8,9,10,14 and return in column C the word "printer" (i.e. same word contains in all these items) - highlight (with different color) items 6,11,12,13,15 and return in column D the word "phone" (i.e. same word contains in all these items) Any suggestion or help on this is greatly appreciated. Thanks heaps ! Everyone. SauQ |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Pattern match and grouped items
On Thu, 2 Jul 2009 18:46:14 -0700 (PDT), SauQ wrote: Dear all I have in : -- Column A, a list of items say, 1 YORK A/COND 2 SANYO 1.0HP air-cond 3 CANON laser printer (model : L-P'TER*LBP-3050) 4 EPSON - Batching system printer 5 EPSON - Dot matrix printer 6 Nokia 1200 - handphone 7 HP printer 8 Panasonic K Printer 9 Panasonic X printer 10 Panasonic P-P1121 printer 11 Nokia N6070 handphone 12 Nokia 1200 handphone 13 Microphone 14 Panasonic printer 15 2nd hand hand phone I am looking for a function / vba / anyway if excel can do a pattern match and : - highlight (with color) items 1,2 and return in column B the word "cond" (i.e. same word contains in both the the items) - highlight (with different color) items 3,4,5,7,8,9,10,14 and return in column C the word "printer" (i.e. same word contains in all these items) - highlight (with different color) items 6,11,12,13,15 and return in column D the word "phone" (i.e. same word contains in all these items) Any suggestion or help on this is greatly appreciated. Thanks heaps ! Everyone. SauQ How do you define which of the words or portions of a word you want to match? Examples: EPSON is found in lines 4,5 printer is found in multiple lines and the range containing "printer" overlaps the range containing "EPSON". How to color ranges with overlapping duplicates? Also, you write that you want to match the word "phone", yet that is only a portion of the word "handphone" in a number of the lines you are matching. --ron |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Pattern match and grouped items
Dear Ron, you are absolutely correct, silly of me not to notice there
are other duplicate words (Nokia, Panasonic, Epson) as well. In this case, i need excel to highlight rows with certain % of words matched. Maybe an option to choose / select the % to be highlighted (e.g: above 50% match found is to be highlighted). Many thanks again, SauQ Ron Rosenfeld wrote: On Thu, 2 Jul 2009 18:46:14 -0700 (PDT), SauQ wrote: Dear all I have in : -- Column A, a list of items say, 1 YORK A/COND 2 SANYO 1.0HP air-cond 3 CANON laser printer (model : L-P'TER*LBP-3050) 4 EPSON - Batching system printer 5 EPSON - Dot matrix printer 6 Nokia 1200 - handphone 7 HP printer 8 Panasonic K Printer 9 Panasonic X printer 10 Panasonic P-P1121 printer 11 Nokia N6070 handphone 12 Nokia 1200 handphone 13 Microphone 14 Panasonic printer 15 2nd hand hand phone I am looking for a function / vba / anyway if excel can do a pattern match and : - highlight (with color) items 1,2 and return in column B the word "cond" (i.e. same word contains in both the the items) - highlight (with different color) items 3,4,5,7,8,9,10,14 and return in column C the word "printer" (i.e. same word contains in all these items) - highlight (with different color) items 6,11,12,13,15 and return in column D the word "phone" (i.e. same word contains in all these items) Any suggestion or help on this is greatly appreciated. Thanks heaps ! Everyone. SauQ How do you define which of the words or portions of a word you want to match? Examples: EPSON is found in lines 4,5 printer is found in multiple lines and the range containing "printer" overlaps the range containing "EPSON". How to color ranges with overlapping duplicates? Also, you write that you want to match the word "phone", yet that is only a portion of the word "handphone" in a number of the lines you are matching. --ron |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Pattern match and grouped items
On Thu, 2 Jul 2009 21:23:54 -0700 (PDT), SauQ wrote:
Dear Ron, you are absolutely correct, silly of me not to notice there are other duplicate words (Nokia, Panasonic, Epson) as well. In this case, i need excel to highlight rows with certain % of words matched. Maybe an option to choose / select the % to be highlighted (e.g: above 50% match found is to be highlighted). Many thanks again, SauQ I think you need a better description of what might constitute a valid match; and also rules for what should happen when the same line matches other lines, but with different words (e.g. Epson and printer have overlapping matchings). Perhaps it would be best to back up and look at the purpose of this process. For example, if you are trying to find how many entries there are that refer to, let us say, "phones", it would probably be a lot simpler to develop a list of keywords that you would search for. For example, if you decided that your list of key words was phone printer cond you could enter those words in a series of cells, and then just check the data against those cells to see what's where. You'd still have the problem of multiple matchings, but you could probably flag the entry somehow. --ron |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Pattern match and grouped items
Hi Ron
The purpose of this process is to come up with a total cost for all similar items (e.g the total cost of printers, regardless of brand - Epson/Panasonic). The problem is to come up with a list of keywords of items, as there are a few thousand rows of different items with different brands (and many are overlapping matches as you have pointed out). I do not know entirely what are the items contain in the list. I was hoping that if excel could highlight / identify these items (say, keyword=printer) which are similar then I could use the SUMPRODUCT function to total up all cost based on the keywords. But then again, excel obviously would not know how to distinguish between an item (handphone, printer, cond, etc) from brands (Epson, panasonic, etc). I reckon the best way is to get excel to list all duplicate words (regardless if its item / brand), then I will manually pick out the items from the list. I wonder if there is a better way around this. Many thanks again for your effort and time in looking at this. SauQ Ron Rosenfeld wrote: On Thu, 2 Jul 2009 21:23:54 -0700 (PDT), SauQ wrote: Dear Ron, you are absolutely correct, silly of me not to notice there are other duplicate words (Nokia, Panasonic, Epson) as well. In this case, i need excel to highlight rows with certain % of words matched. Maybe an option to choose / select the % to be highlighted (e.g: above 50% match found is to be highlighted). Many thanks again, SauQ I think you need a better description of what might constitute a valid match; and also rules for what should happen when the same line matches other lines, but with different words (e.g. Epson and printer have overlapping matchings). Perhaps it would be best to back up and look at the purpose of this process. For example, if you are trying to find how many entries there are that refer to, let us say, "phones", it would probably be a lot simpler to develop a list of keywords that you would search for. For example, if you decided that your list of key words was phone printer cond you could enter those words in a series of cells, and then just check the data against those cells to see what's where. You'd still have the problem of multiple matchings, but you could probably flag the entry somehow. --ron |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
Pattern match and grouped items
On Sun, 5 Jul 2009 19:47:52 -0700 (PDT), SauQ wrote: Hi Ron The purpose of this process is to come up with a total cost for all similar items (e.g the total cost of printers, regardless of brand - Epson/Panasonic). The problem is to come up with a list of keywords of items, as there are a few thousand rows of different items with different brands (and many are overlapping matches as you have pointed out). I do not know entirely what are the items contain in the list. I was hoping that if excel could highlight / identify these items (say, keyword=printer) which are similar then I could use the SUMPRODUCT function to total up all cost based on the keywords. But then again, excel obviously would not know how to distinguish between an item (handphone, printer, cond, etc) from brands (Epson, panasonic, etc). I reckon the best way is to get excel to list all duplicate words (regardless if its item / brand), then I will manually pick out the items from the list. I wonder if there is a better way around this. Many thanks again for your effort and time in looking at this. SauQ Ron Rosenfeld wrote: On Thu, 2 Jul 2009 21:23:54 -0700 (PDT), SauQ wrote: Dear Ron, you are absolutely correct, silly of me not to notice there are other duplicate words (Nokia, Panasonic, Epson) as well. In this case, i need excel to highlight rows with certain % of words matched. Maybe an option to choose / select the % to be highlighted (e.g: above 50% match found is to be highlighted). Many thanks again, SauQ I think you need a better description of what might constitute a valid match; and also rules for what should happen when the same line matches other lines, but with different words (e.g. Epson and printer have overlapping matchings). Perhaps it would be best to back up and look at the purpose of this process. For example, if you are trying to find how many entries there are that refer to, let us say, "phones", it would probably be a lot simpler to develop a list of keywords that you would search for. For example, if you decided that your list of key words was phone printer cond you could enter those words in a series of cells, and then just check the data against those cells to see what's where. You'd still have the problem of multiple matchings, but you could probably flag the entry somehow. --ron I think you will have to develop your keyword list manually. There are routines that can generate a list of individual words, sorted either alphabetically or by word count. Then you can manually decide which you want to use as keywords (perhaps with wild cards) in a SUMPRODUCT or SUMIF formula. Given your initial example: YORK A/COND SANYO 1.0HP air-cond CANON laser printer (model : L-P'TER*LBP-3050) EPSON - Batching system printer EPSON - Dot matrix printer Nokia 1200 - handphone HP printer Panasonic K Printer Panasonic X printer Panasonic P-P1121 printer Nokia N6070 handphone Nokia 1200 handphone Microphone Panasonic printer 2nd hand hand phone And defining a "word" as having only letters, digits or hyphens; and also being at least two characters in length, here is a list of individual words sorted by frequency (descending): printer 8 Panasonic 4 Nokia 3 handphone 3 COND 2 EPSON 2 1200 2 hand 2 YORK 1 SANYO 1 0HP 1 air-cond 1 CANON 1 laser 1 model 1 L-P 1 Batching 1 system 1 Dot 1 matrix 1 HP 1 P-P1121 1 N6070 1 Microphone 1 2nd 1 phone 1 And here is a list sorted alphabetically (ascending): 0HP 1 1200 2 2nd 1 air-cond 1 Batching 1 CANON 1 COND 2 Dot 1 EPSON 2 hand 2 handphone 3 HP 1 laser 1 L-P 1 matrix 1 Microphone 1 model 1 N6070 1 Nokia 3 Panasonic 4 phone 1 P-P1121 1 printer 8 SANYO 1 system 1 YORK 1 There are instructions in the macro posted below for how to change from numeric to alpha sorting; and also how to change from ascending to descending sorts. Also, in the macro, as of now, the data source (rSrc) is set to Selection; and the output destination is set to start at C1 (rDest). These can be changed or set up in different ways. Maybe this will help. To enter this Macro (Sub), <alt-F11 opens the Visual Basic Editor. Ensure your project is highlighted in the Project Explorer window. Then, from the top menu, select Insert/Module and paste the code below into the window that opens. To use this Macro (Sub): Select your data range (unless you have modified the Sub to do it automatically). Then <alt-F8 opens the macro dialog box. Select the macro by name, and <RUN. =================================== Option Explicit Option Compare Text Sub UniqueWordList() Dim rSrc As Range, rDest As Range, c As Range Dim cWordList As Collection Dim res() As Variant Dim w() As String Dim i As Long Set cWordList = New Collection Set rSrc = Selection Set rDest = Range("C1") rDest.EntireColumn.NumberFormat = "@" For Each c In rSrc w = Split(c.Value) For i = 0 To UBound(w) w(i) = StripWord(w(i)) If Not w(i) = "" Then On Error Resume Next cWordList.Add Item:=w(i), Key:=w(i) On Error GoTo 0 End If Next i Next c 'transfer words to results array ReDim res(1 To cWordList.Count, 0 To 1) For i = 1 To cWordList.Count res(i, 0) = cWordList(i) Next i 'get counts For i = LBound(res) To UBound(res) For Each c In rSrc res(i, 1) = res(i, 1) + CountWord(c.Value, res(i, 0)) Next c Next i 'sort alpha: d=0; sort numeric d=1 'there are various ways of sorting BubbleSort res, 0 rDest.CurrentRegion.Clear For i = LBound(res) To UBound(res) rDest.Offset(i, 0).NumberFormat = "@" rDest.Offset(i, 0).Value = res(i, 0) 'For just lowercase output, use: 'rDest.Offset(i, 0).Value = LCase(res(i, 0)) rDest.Offset(i, 1).Value = res(i, 1) Next i End Sub Private Function StripWord(s As String) As String Dim re As Object, mc As Object, m As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True re.Pattern = "[-\w]{2,}" If re.test(s) = True Then Set mc = re.Execute(s) StripWord = mc(0).Value End If Set re = Nothing End Function Private Function CountWord(ByVal s As String, sPat) As Long Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.Global = True re.IgnoreCase = True re.Pattern = "\b" & sPat & "\b" Set mc = re.Execute(s) CountWord = mc.Count End Function Private Sub BubbleSort(TempArray As Variant, d As Long) 'd is 0 based dimension Dim temp(0, 1) As Variant Dim i As Integer Dim NoExchanges As Integer ' Loop until no more "exchanges" are made. Do NoExchanges = True ' Loop through each element in the array. For i = LBound(TempArray) To UBound(TempArray) - 1 ' If the element is less than the element ' following it, exchange the two elements. ' change "<" to "" to sort ascending If TempArray(i, d) TempArray(i + 1, d) Then NoExchanges = False temp(0, 0) = TempArray(i, 0) temp(0, 1) = TempArray(i, 1) TempArray(i, 0) = TempArray(i + 1, 0) TempArray(i, 1) = TempArray(i + 1, 1) TempArray(i + 1, 0) = temp(0, 0) TempArray(i + 1, 1) = temp(0, 1) End If Next i Loop While Not (NoExchanges) End Sub ================================================== === --ron |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
Pattern match and grouped items
Hi Ron,
I have tried your sub on my other longer list and it works like charm. The results spanned slightly over a 100 lines, compared to a few thousand lines that I am stuck with earlier on, I' ll say your solution is a major improvement in efficiency though I still have to pick up the ones I want manually. My zillion thanks to you. Heartfelt appreciation for your effort and time. SauQ Ron Rosenfeld wrote: On Sun, 5 Jul 2009 19:47:52 -0700 (PDT), SauQ wrote: Hi Ron The purpose of this process is to come up with a total cost for all similar items (e.g the total cost of printers, regardless of brand - Epson/Panasonic). The problem is to come up with a list of keywords of items, as there are a few thousand rows of different items with different brands (and many are overlapping matches as you have pointed out). I do not know entirely what are the items contain in the list. I was hoping that if excel could highlight / identify these items (say, keyword=printer) which are similar then I could use the SUMPRODUCT function to total up all cost based on the keywords. But then again, excel obviously would not know how to distinguish between an item (handphone, printer, cond, etc) from brands (Epson, panasonic, etc). I reckon the best way is to get excel to list all duplicate words (regardless if its item / brand), then I will manually pick out the items from the list. I wonder if there is a better way around this. Many thanks again for your effort and time in looking at this. SauQ Ron Rosenfeld wrote: On Thu, 2 Jul 2009 21:23:54 -0700 (PDT), SauQ wrote: Dear Ron, you are absolutely correct, silly of me not to notice there are other duplicate words (Nokia, Panasonic, Epson) as well. In this case, i need excel to highlight rows with certain % of words matched. Maybe an option to choose / select the % to be highlighted (e.g: above 50% match found is to be highlighted). Many thanks again, SauQ I think you need a better description of what might constitute a valid match; and also rules for what should happen when the same line matches other lines, but with different words (e.g. Epson and printer have overlapping matchings). Perhaps it would be best to back up and look at the purpose of this process. For example, if you are trying to find how many entries there are that refer to, let us say, "phones", it would probably be a lot simpler to develop a list of keywords that you would search for. For example, if you decided that your list of key words was phone printer cond you could enter those words in a series of cells, and then just check the data against those cells to see what's where. You'd still have the problem of multiple matchings, but you could probably flag the entry somehow. --ron I think you will have to develop your keyword list manually. There are routines that can generate a list of individual words, sorted either alphabetically or by word count. Then you can manually decide which you want to use as keywords (perhaps with wild cards) in a SUMPRODUCT or SUMIF formula. Given your initial example: YORK A/COND SANYO 1.0HP air-cond CANON laser printer (model : L-P'TER*LBP-3050) EPSON - Batching system printer EPSON - Dot matrix printer Nokia 1200 - handphone HP printer Panasonic K Printer Panasonic X printer Panasonic P-P1121 printer Nokia N6070 handphone Nokia 1200 handphone Microphone Panasonic printer 2nd hand hand phone And defining a "word" as having only letters, digits or hyphens; and also being at least two characters in length, here is a list of individual words sorted by frequency (descending): printer 8 Panasonic 4 Nokia 3 handphone 3 COND 2 EPSON 2 1200 2 hand 2 YORK 1 SANYO 1 0HP 1 air-cond 1 CANON 1 laser 1 model 1 L-P 1 Batching 1 system 1 Dot 1 matrix 1 HP 1 P-P1121 1 N6070 1 Microphone 1 2nd 1 phone 1 And here is a list sorted alphabetically (ascending): 0HP 1 1200 2 2nd 1 air-cond 1 Batching 1 CANON 1 COND 2 Dot 1 EPSON 2 hand 2 handphone 3 HP 1 laser 1 L-P 1 matrix 1 Microphone 1 model 1 N6070 1 Nokia 3 Panasonic 4 phone 1 P-P1121 1 printer 8 SANYO 1 system 1 YORK 1 There are instructions in the macro posted below for how to change from numeric to alpha sorting; and also how to change from ascending to descending sorts. Also, in the macro, as of now, the data source (rSrc) is set to Selection; and the output destination is set to start at C1 (rDest). These can be changed or set up in different ways. Maybe this will help. To enter this Macro (Sub), <alt-F11 opens the Visual Basic Editor. Ensure your project is highlighted in the Project Explorer window. Then, from the top menu, select Insert/Module and paste the code below into the window that opens. To use this Macro (Sub): Select your data range (unless you have modified the Sub to do it automatically). Then <alt-F8 opens the macro dialog box. Select the macro by name, and <RUN. =================================== Option Explicit Option Compare Text Sub UniqueWordList() Dim rSrc As Range, rDest As Range, c As Range Dim cWordList As Collection Dim res() As Variant Dim w() As String Dim i As Long Set cWordList = New Collection Set rSrc = Selection Set rDest = Range("C1") rDest.EntireColumn.NumberFormat = "@" For Each c In rSrc w = Split(c.Value) For i = 0 To UBound(w) w(i) = StripWord(w(i)) If Not w(i) = "" Then On Error Resume Next cWordList.Add Item:=w(i), Key:=w(i) On Error GoTo 0 End If Next i Next c 'transfer words to results array ReDim res(1 To cWordList.Count, 0 To 1) For i = 1 To cWordList.Count res(i, 0) = cWordList(i) Next i 'get counts For i = LBound(res) To UBound(res) For Each c In rSrc res(i, 1) = res(i, 1) + CountWord(c.Value, res(i, 0)) Next c Next i 'sort alpha: d=0; sort numeric d=1 'there are various ways of sorting BubbleSort res, 0 rDest.CurrentRegion.Clear For i = LBound(res) To UBound(res) rDest.Offset(i, 0).NumberFormat = "@" rDest.Offset(i, 0).Value = res(i, 0) 'For just lowercase output, use: 'rDest.Offset(i, 0).Value = LCase(res(i, 0)) rDest.Offset(i, 1).Value = res(i, 1) Next i End Sub Private Function StripWord(s As String) As String Dim re As Object, mc As Object, m As Object Set re = CreateObject("vbscript.regexp") re.IgnoreCase = True re.Global = True re.Pattern = "[-\w]{2,}" If re.test(s) = True Then Set mc = re.Execute(s) StripWord = mc(0).Value End If Set re = Nothing End Function Private Function CountWord(ByVal s As String, sPat) As Long Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.Global = True re.IgnoreCase = True re.Pattern = "\b" & sPat & "\b" Set mc = re.Execute(s) CountWord = mc.Count End Function Private Sub BubbleSort(TempArray As Variant, d As Long) 'd is 0 based dimension Dim temp(0, 1) As Variant Dim i As Integer Dim NoExchanges As Integer ' Loop until no more "exchanges" are made. Do NoExchanges = True ' Loop through each element in the array. For i = LBound(TempArray) To UBound(TempArray) - 1 ' If the element is less than the element ' following it, exchange the two elements. ' change "<" to "" to sort ascending If TempArray(i, d) TempArray(i + 1, d) Then NoExchanges = False temp(0, 0) = TempArray(i, 0) temp(0, 1) = TempArray(i, 1) TempArray(i, 0) = TempArray(i + 1, 0) TempArray(i, 1) = TempArray(i + 1, 1) TempArray(i + 1, 0) = temp(0, 0) TempArray(i + 1, 1) = temp(0, 1) End If Next i Loop While Not (NoExchanges) End Sub ================================================== === --ron |
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
Pattern match and grouped items
On Mon, 6 Jul 2009 21:02:05 -0700 (PDT), SauQ wrote: Hi Ron, I have tried your sub on my other longer list and it works like charm. The results spanned slightly over a 100 lines, compared to a few thousand lines that I am stuck with earlier on, I' ll say your solution is a major improvement in efficiency though I still have to pick up the ones I want manually. My zillion thanks to you. Heartfelt appreciation for your effort and time. SauQ I'm glad it was useful. Thanks for the feedback. And yes, I don't know of a good way to avoid manually picking up the keywords on which you want to focus. But once you do, it should be relatively simple to derive appropriate functions/formulas to analyze. --ron |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Grouped Items won't Stay Grouped When Moving Object | Excel Discussion (Misc queries) | |||
How to retrieve grouped items data | Excel Discussion (Misc queries) | |||
Remove grouped items and fields from pivot table | Excel Discussion (Misc queries) | |||
Pivot table-show all grouped items | Excel Discussion (Misc queries) | |||
Project Window in VBE not displaying items grouped properly | Excel Programming |