Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Speed up Array Transfer to Dictionary
Hi I have a routine which checks an array with duplicate entries and tranfers
only the unique items to a dictionary. The code works fine excpet it is very slow. Its slow because the array has to compare each of its element to every other element in the array. If the size of the array is large, the code slows down dramatically. The array is called PR2 and the Dictionary is called Dict1. Note I need all the duplicates as I use them to calculate a frequency of occurence for each duplicate. This is why I kept the duplicate data in an array in the first place as a dictionary cannot have duplicate keys. Set Dict1 = New Dictionary For t = LBound(PR2) To UBound(PR2) If Not Dict1.Exists(PR2(t)) Then For z = LBound(PR2) To UBound(PR2) If z = t Then 'Do nothing as you are comparing this 'word to itself ElseIf z < UBound(PR2) Then If PR2(t) = PR2(z) Then 'Duplicate flagged. Don't add to 'Dictionary dupe = dupe + 1 End If Else 'Only add to dictionary on last 'run as dictionary can not have duplicate 'keys Dict1.Add PR2(t), dupe End If Next End If 'Reset Duplicate dupe = 1 Next Thanks EM |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Speed up Array Transfer to Dictionary
Just a thought, How about you load two arrays. One unique that you will load into the dictionary and the second to just count the dupes? Or just count the dupes in the source directly...? I think that might run faster since there will be fewer comparisons? "ExcelMonkey" wrote: Hi I have a routine which checks an array with duplicate entries and tranfers only the unique items to a dictionary. The code works fine excpet it is very slow. Its slow because the array has to compare each of its element to every other element in the array. If the size of the array is large, the code slows down dramatically. The array is called PR2 and the Dictionary is called Dict1. Note I need all the duplicates as I use them to calculate a frequency of occurence for each duplicate. This is why I kept the duplicate data in an array in the first place as a dictionary cannot have duplicate keys. Set Dict1 = New Dictionary For t = LBound(PR2) To UBound(PR2) If Not Dict1.Exists(PR2(t)) Then For z = LBound(PR2) To UBound(PR2) If z = t Then 'Do nothing as you are comparing this 'word to itself ElseIf z < UBound(PR2) Then If PR2(t) = PR2(z) Then 'Duplicate flagged. Don't add to 'Dictionary dupe = dupe + 1 End If Else 'Only add to dictionary on last 'run as dictionary can not have duplicate 'keys Dict1.Add PR2(t), dupe End If Next End If 'Reset Duplicate dupe = 1 Next Thanks EM |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Speed up Array Transfer to Dictionary
Right but I start out with an array with duplicates. The only way to get a
second array without duplicates is to compare individual entries in the first array with themselves. Hence the time consumption. My post was slightly mislabeled as the slow processing is not due to the transfer of the data from the array to the dictionary but from the comparison of the data in the array to itslef to find the dupclicates. Thanks EM "xp" wrote: Just a thought, How about you load two arrays. One unique that you will load into the dictionary and the second to just count the dupes? Or just count the dupes in the source directly...? I think that might run faster since there will be fewer comparisons? "ExcelMonkey" wrote: Hi I have a routine which checks an array with duplicate entries and tranfers only the unique items to a dictionary. The code works fine excpet it is very slow. Its slow because the array has to compare each of its element to every other element in the array. If the size of the array is large, the code slows down dramatically. The array is called PR2 and the Dictionary is called Dict1. Note I need all the duplicates as I use them to calculate a frequency of occurence for each duplicate. This is why I kept the duplicate data in an array in the first place as a dictionary cannot have duplicate keys. Set Dict1 = New Dictionary For t = LBound(PR2) To UBound(PR2) If Not Dict1.Exists(PR2(t)) Then For z = LBound(PR2) To UBound(PR2) If z = t Then 'Do nothing as you are comparing this 'word to itself ElseIf z < UBound(PR2) Then If PR2(t) = PR2(z) Then 'Duplicate flagged. Don't add to 'Dictionary dupe = dupe + 1 End If Else 'Only add to dictionary on last 'run as dictionary can not have duplicate 'keys Dict1.Add PR2(t), dupe End If Next End If 'Reset Duplicate dupe = 1 Next Thanks EM |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Speed up Array Transfer to Dictionary
Have you considered something like the following? <aircode warning The goal
here is (a) to reduce the number of comparisons, and (b) to avoid a full loop of comparisons every time you have a known duplicate. Dim mylist(1 to 100, 1 to 2) For i = lbound(mylist(1)) to ubound(mylist(1)) '1 to 100 If mylist(i,2) = "" then 'no matches yet 'add a non-dupe count, or add to your dictionary here For p = (i+1) to ubound(mylist(1)) if mylist(p,1) = mylist(i,2) then mylist(p,2) = "Duplicate" DupeCount = DupeCount+1 Next endif Next "ExcelMonkey" wrote: Right but I start out with an array with duplicates. The only way to get a second array without duplicates is to compare individual entries in the first array with themselves. Hence the time consumption. My post was slightly mislabeled as the slow processing is not due to the transfer of the data from the array to the dictionary but from the comparison of the data in the array to itslef to find the dupclicates. Thanks EM "xp" wrote: Just a thought, How about you load two arrays. One unique that you will load into the dictionary and the second to just count the dupes? Or just count the dupes in the source directly...? I think that might run faster since there will be fewer comparisons? "ExcelMonkey" wrote: Hi I have a routine which checks an array with duplicate entries and tranfers only the unique items to a dictionary. The code works fine excpet it is very slow. Its slow because the array has to compare each of its element to every other element in the array. If the size of the array is large, the code slows down dramatically. The array is called PR2 and the Dictionary is called Dict1. Note I need all the duplicates as I use them to calculate a frequency of occurence for each duplicate. This is why I kept the duplicate data in an array in the first place as a dictionary cannot have duplicate keys. Set Dict1 = New Dictionary For t = LBound(PR2) To UBound(PR2) If Not Dict1.Exists(PR2(t)) Then For z = LBound(PR2) To UBound(PR2) If z = t Then 'Do nothing as you are comparing this 'word to itself ElseIf z < UBound(PR2) Then If PR2(t) = PR2(z) Then 'Duplicate flagged. Don't add to 'Dictionary dupe = dupe + 1 End If Else 'Only add to dictionary on last 'run as dictionary can not have duplicate 'keys Dict1.Add PR2(t), dupe End If Next End If 'Reset Duplicate dupe = 1 Next Thanks EM |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Speed up Array Transfer to Dictionary
The only way to get a second array without duplicates is to compare
individual entries in the first array with themselves. Hence the time consumption. Hi. If I understand the problem, the Dictionary itself is more efficient. I believe you want a list of unique entries, along with their count. Then you will enter this info into your main dictionary. The main idea is to use the "item" property. My suggestion would be to use a temporary dictionary to reduce the data, then enter this into the main dictionary. Here is a small example that should give you some ideas. Sub Demo() '// vba library includes 'Microsoft Scripting Runtime.' Dim J As Long Dim Dic As Dictionary Set Dic = New Dictionary Dim T(1 To 5) 'An array with some dup data T(1) = 11 T(2) = 12 T(3) = 11 T(4) = 13 T(5) = 12 For J = LBound(T) To UBound(T) If Dic.Exists(T(J)) Then Dic.Item(T(J)) = Dic.Item(T(J)) + 1 Else Dic.Add T(J), 1 End If Next J Debug.Print Dic(11) Debug.Print Dic(12) Debug.Print Dic(13) End Sub Returns: 2 2 1 = = = = HTH Dana DeLouis ExcelMonkey wrote: Right but I start out with an array with duplicates. The only way to get a second array without duplicates is to compare individual entries in the first array with themselves. Hence the time consumption. My post was slightly mislabeled as the slow processing is not due to the transfer of the data from the array to the dictionary but from the comparison of the data in the array to itslef to find the dupclicates. Thanks EM "xp" wrote: Just a thought, How about you load two arrays. One unique that you will load into the dictionary and the second to just count the dupes? Or just count the dupes in the source directly...? I think that might run faster since there will be fewer comparisons? "ExcelMonkey" wrote: Hi I have a routine which checks an array with duplicate entries and tranfers only the unique items to a dictionary. The code works fine excpet it is very slow. Its slow because the array has to compare each of its element to every other element in the array. If the size of the array is large, the code slows down dramatically. The array is called PR2 and the Dictionary is called Dict1. Note I need all the duplicates as I use them to calculate a frequency of occurence for each duplicate. This is why I kept the duplicate data in an array in the first place as a dictionary cannot have duplicate keys. Set Dict1 = New Dictionary For t = LBound(PR2) To UBound(PR2) If Not Dict1.Exists(PR2(t)) Then For z = LBound(PR2) To UBound(PR2) If z = t Then 'Do nothing as you are comparing this 'word to itself ElseIf z < UBound(PR2) Then If PR2(t) = PR2(z) Then 'Duplicate flagged. Don't add to 'Dictionary dupe = dupe + 1 End If Else 'Only add to dictionary on last 'run as dictionary can not have duplicate 'keys Dict1.Add PR2(t), dupe End If Next End If 'Reset Duplicate dupe = 1 Next Thanks EM |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
Speed up Array Transfer to Dictionary
Another way to get around this would be to check to see if the item in the
array already exists in the Dictionary. If it does not exist, Add it with an item of "1". If it does exist don't add anything and simply update the item for that specific key. I think this allows me to avoid the comparison within the array itslelf. See below Sub DumpArrayToDictionary() Dim PR2() As String ReDim PR2(0 To 6) PR2(0) = "cat" PR2(1) = "dog" PR2(2) = "mouse" PR2(3) = "cat" PR2(4) = "dog" PR2(5) = "dog" PR2(6) = "dog" dupe = 1 Set Dict1 = New Dictionary For t = LBound(PR2) To UBound(PR2) If Not Dict1.Exists(PR2(t)) Then Dict1.Add PR2(t), dupe Else Dict1.Item(PR2(t)) = Dict1.Item(PR2(t)) + 1 End If Next Dim cumcount As Double cumcount = 0 For Each Item In Dict1 Debug.Print Item & " " & Dict1(Item) cumcount = cumcount + Dict1(Item) Next Debug.Print cumcount End Sub "ExcelMonkey" wrote: Hi I have a routine which checks an array with duplicate entries and tranfers only the unique items to a dictionary. The code works fine excpet it is very slow. Its slow because the array has to compare each of its element to every other element in the array. If the size of the array is large, the code slows down dramatically. The array is called PR2 and the Dictionary is called Dict1. Note I need all the duplicates as I use them to calculate a frequency of occurence for each duplicate. This is why I kept the duplicate data in an array in the first place as a dictionary cannot have duplicate keys. Set Dict1 = New Dictionary For t = LBound(PR2) To UBound(PR2) If Not Dict1.Exists(PR2(t)) Then For z = LBound(PR2) To UBound(PR2) If z = t Then 'Do nothing as you are comparing this 'word to itself ElseIf z < UBound(PR2) Then If PR2(t) = PR2(z) Then 'Duplicate flagged. Don't add to 'Dictionary dupe = dupe + 1 End If Else 'Only add to dictionary on last 'run as dictionary can not have duplicate 'keys Dict1.Add PR2(t), dupe End If Next End If 'Reset Duplicate dupe = 1 Next Thanks EM |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
Speed up Array Transfer to Dictionary
Thanks Dana. I posted my result prior to seeing yours. I think they are
pretty much the same. Thanks EM "Dana DeLouis" wrote: The only way to get a second array without duplicates is to compare individual entries in the first array with themselves. Hence the time consumption. Hi. If I understand the problem, the Dictionary itself is more efficient. I believe you want a list of unique entries, along with their count. Then you will enter this info into your main dictionary. The main idea is to use the "item" property. My suggestion would be to use a temporary dictionary to reduce the data, then enter this into the main dictionary. Here is a small example that should give you some ideas. Sub Demo() '// vba library includes 'Microsoft Scripting Runtime.' Dim J As Long Dim Dic As Dictionary Set Dic = New Dictionary Dim T(1 To 5) 'An array with some dup data T(1) = 11 T(2) = 12 T(3) = 11 T(4) = 13 T(5) = 12 For J = LBound(T) To UBound(T) If Dic.Exists(T(J)) Then Dic.Item(T(J)) = Dic.Item(T(J)) + 1 Else Dic.Add T(J), 1 End If Next J Debug.Print Dic(11) Debug.Print Dic(12) Debug.Print Dic(13) End Sub Returns: 2 2 1 = = = = HTH Dana DeLouis ExcelMonkey wrote: Right but I start out with an array with duplicates. The only way to get a second array without duplicates is to compare individual entries in the first array with themselves. Hence the time consumption. My post was slightly mislabeled as the slow processing is not due to the transfer of the data from the array to the dictionary but from the comparison of the data in the array to itslef to find the dupclicates. Thanks EM "xp" wrote: Just a thought, How about you load two arrays. One unique that you will load into the dictionary and the second to just count the dupes? Or just count the dupes in the source directly...? I think that might run faster since there will be fewer comparisons? "ExcelMonkey" wrote: Hi I have a routine which checks an array with duplicate entries and tranfers only the unique items to a dictionary. The code works fine excpet it is very slow. Its slow because the array has to compare each of its element to every other element in the array. If the size of the array is large, the code slows down dramatically. The array is called PR2 and the Dictionary is called Dict1. Note I need all the duplicates as I use them to calculate a frequency of occurence for each duplicate. This is why I kept the duplicate data in an array in the first place as a dictionary cannot have duplicate keys. Set Dict1 = New Dictionary For t = LBound(PR2) To UBound(PR2) If Not Dict1.Exists(PR2(t)) Then For z = LBound(PR2) To UBound(PR2) If z = t Then 'Do nothing as you are comparing this 'word to itself ElseIf z < UBound(PR2) Then If PR2(t) = PR2(z) Then 'Duplicate flagged. Don't add to 'Dictionary dupe = dupe + 1 End If Else 'Only add to dictionary on last 'run as dictionary can not have duplicate 'keys Dict1.Add PR2(t), dupe End If Next End If 'Reset Duplicate dupe = 1 Next Thanks EM |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Improve speed of data transfer loop | Excel Programming | |||
Speed of fixed array versus dynamic array | Excel Programming | |||
Please Help: Transferring from Dictionary to Array | Excel Programming | |||
HOW TO TRANSFER AUTOCORRECT DICTIONARY TO NEW COMPUTER?? | Setting up and Configuration of Excel | |||
Speed up performance for very big array | Excel Programming |