![]() |
Elimination of Duplicates in an Array and its reduction.
I have two large columns arrays A1:A4000 and B1:B4000
A: contains words alphabetically sorted and B: their corresponding page numbers. I need to reduce all duplicates of a word to one row with all their number occurences on adjacent column cells of the same row. For example A B C D E F word A 21 word A 21 25 word A 25 to word B 10 15 17 word B 10 word C 3 word B 15 word B 17 word C 3 How do I do this? |
Elimination of Duplicates in an Array and its reduction.
The formating in my example in this post got mangled!
I'll try again. A B WordA 21 WordA 25 WordB 10 WordB 15 WordB 17 WordC 3 Down to C D E F WordA 21 25 WordB 10 15 17 WordC 3 "Joe" wrote: I have two large columns arrays A1:A4000 and B1:B4000 A: contains words alphabetically sorted and B: their corresponding page numbers. I need to reduce all duplicates of a word to one row with all their number occurences on adjacent column cells of the same row. For example A B C D E F word A 21 word A 21 25 word A 25 to word B 10 15 17 word B 10 word C 3 word B 15 word B 17 word C 3 How do I do this? |
Elimination of Duplicates in an Array and its reduction.
Here's one way.
Use the Advanced filter to extract the uniques from column A to another location: Select cell A1 Goto the menu DataFilterAdvanced filter Select: Copy to another location List range: $A$1:$A$4000 Copy to: $F$1 Unique records only OK Enter this array** formula in G1: =IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "") You'll have to copy across to a number of cells that is equal to the maximum count of any single word from column A. In your posted example WordB appeared the most times, 3. So you'll have to copy the formula across to a total of 3 columns. Then copy down to the end of the list in column F. ** array formulas need to be entered using the key combination of CTRL,SHIFT,ENTER (not just ENTER) Biff "Joe" wrote in message ... I have two large columns arrays A1:A4000 and B1:B4000 A: contains words alphabetically sorted and B: their corresponding page numbers. I need to reduce all duplicates of a word to one row with all their number occurences on adjacent column cells of the same row. For example A B C D E F word A 21 word A 21 25 word A 25 to word B 10 15 17 word B 10 word C 3 word B 15 word B 17 word C 3 How do I do this? |
Elimination of Duplicates in an Array and its reduction.
The equation works well.
I noticed that when using DataFilterAdvanced filter after selecting A1 then "Copy to another location" and "Unique records only" ticked It would produce a duplicate of the first two rows if they were actually duplicates as in my example. The list range was $A$1:$A$6 Is this a bug in excel or is it something to do with the array not having a labels row? "T. Valko" wrote: Here's one way. Use the Advanced filter to extract the uniques from column A to another location: Select cell A1 Goto the menu DataFilterAdvanced filter Select: Copy to another location List range: $A$1:$A$4000 Copy to: $F$1 Unique records only OK Enter this array** formula in G1: =IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "") You'll have to copy across to a number of cells that is equal to the maximum count of any single word from column A. In your posted example WordB appeared the most times, 3. So you'll have to copy the formula across to a total of 3 columns. Then copy down to the end of the list in column F. ** array formulas need to be entered using the key combination of CTRL,SHIFT,ENTER (not just ENTER) Biff "Joe" wrote in message ... I have two large columns arrays A1:A4000 and B1:B4000 A: contains words alphabetically sorted and B: their corresponding page numbers. I need to reduce all duplicates of a word to one row with all their number occurences on adjacent column cells of the same row. For example A B C D E F word A 21 word A 21 25 word A 25 to word B 10 15 17 word B 10 word C 3 word B 15 word B 17 word C 3 How do I do this? |
Elimination of Duplicates in an Array and its reduction.
Is this a bug in excel or is it something to do with the
array not having a labels row? Not a bug. The filter "assumes" the first entry is a column label (header) and brings that entry with it. So, if you had: WordA WordA The filter assumes the first WordA is the column header and the second WordA is a unique entry. So it extracts both. Usually when you don't have a "distinctive" header Excel will "complain" and ask you if your data has a header and how you want to handle it. Biff "Joe" wrote in message ... The equation works well. I noticed that when using DataFilterAdvanced filter after selecting A1 then "Copy to another location" and "Unique records only" ticked It would produce a duplicate of the first two rows if they were actually duplicates as in my example. The list range was $A$1:$A$6 Is this a bug in excel or is it something to do with the array not having a labels row? "T. Valko" wrote: Here's one way. Use the Advanced filter to extract the uniques from column A to another location: Select cell A1 Goto the menu DataFilterAdvanced filter Select: Copy to another location List range: $A$1:$A$4000 Copy to: $F$1 Unique records only OK Enter this array** formula in G1: =IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "") You'll have to copy across to a number of cells that is equal to the maximum count of any single word from column A. In your posted example WordB appeared the most times, 3. So you'll have to copy the formula across to a total of 3 columns. Then copy down to the end of the list in column F. ** array formulas need to be entered using the key combination of CTRL,SHIFT,ENTER (not just ENTER) Biff "Joe" wrote in message ... I have two large columns arrays A1:A4000 and B1:B4000 A: contains words alphabetically sorted and B: their corresponding page numbers. I need to reduce all duplicates of a word to one row with all their number occurences on adjacent column cells of the same row. For example A B C D E F word A 21 word A 21 25 word A 25 to word B 10 15 17 word B 10 word C 3 word B 15 word B 17 word C 3 How do I do this? |
Elimination of Duplicates in an Array and its reduction.
The other thing a forgot to mention is that when pasting your formula into
the large array (20X4000) it takes literally many minutes to load and calculate and can hang the program if it runs out of resources (memory?). I have 2GB of physical memory. There are no other applications running at the same time apart from the usual background ones. "T. Valko" wrote: Is this a bug in excel or is it something to do with the array not having a labels row? Not a bug. The filter "assumes" the first entry is a column label (header) and brings that entry with it. So, if you had: WordA WordA The filter assumes the first WordA is the column header and the second WordA is a unique entry. So it extracts both. Usually when you don't have a "distinctive" header Excel will "complain" and ask you if your data has a header and how you want to handle it. Biff "Joe" wrote in message ... The equation works well. I noticed that when using DataFilterAdvanced filter after selecting A1 then "Copy to another location" and "Unique records only" ticked It would produce a duplicate of the first two rows if they were actually duplicates as in my example. The list range was $A$1:$A$6 Is this a bug in excel or is it something to do with the array not having a labels row? "T. Valko" wrote: Here's one way. Use the Advanced filter to extract the uniques from column A to another location: Select cell A1 Goto the menu DataFilterAdvanced filter Select: Copy to another location List range: $A$1:$A$4000 Copy to: $F$1 Unique records only OK Enter this array** formula in G1: =IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "") You'll have to copy across to a number of cells that is equal to the maximum count of any single word from column A. In your posted example WordB appeared the most times, 3. So you'll have to copy the formula across to a total of 3 columns. Then copy down to the end of the list in column F. ** array formulas need to be entered using the key combination of CTRL,SHIFT,ENTER (not just ENTER) Biff "Joe" wrote in message ... I have two large columns arrays A1:A4000 and B1:B4000 A: contains words alphabetically sorted and B: their corresponding page numbers. I need to reduce all duplicates of a word to one row with all their number occurences on adjacent column cells of the same row. For example A B C D E F word A 21 word A 21 25 word A 25 to word B 10 15 17 word B 10 word C 3 word B 15 word B 17 word C 3 How do I do this? |
Elimination of Duplicates in an Array and its reduction.
That's a pretty large range of array formulas.
I thought you had 4000 rows that included dupe words so there shouldn't be 4000 unique word entries. Or, is that possible? Anyhow, you can try a couple of different things which are *slightly* better than what you have now. I didn't test this on a dataset the same size as yours so I can't say you'll get *significantly* better performance but these formulas will calculate *slightly* faster. Assume the uniques have been extracted to column F starting in F1. Use column G as a helper column to get the counts of the words. This will eliminate the 1000's of COUNTIFs from the array formula: Enter in G1 and copy down as needed: =COUNTIF(A$1:A$4000,F1) Change the array formula (now entered in H1) to: =IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,SMALL(IF ($A$1:$A$4000=$F1,ROW(B$1:B$4000)),COLUMNS($A:A))) ,"") Now, this is assuming that your actual range of data is really where you said it was, A1:B4000. If the actual range does not start on row 1 then we have to calculate for the offset and doing that actually makes things worse. If this is a one time process, after you get the data extracted you can convert the formulas to constants. Select the range of formulas Goto EditCopy Then EditPaste SpecialValuesOK If this is a repetitive process a pivot table may be a better solution. I can't really help with a pt because I hate them and rarely ever use them. Pivot tables aren't dynamic either. Biff "Joe" wrote in message ... The other thing a forgot to mention is that when pasting your formula into the large array (20X4000) it takes literally many minutes to load and calculate and can hang the program if it runs out of resources (memory?). I have 2GB of physical memory. There are no other applications running at the same time apart from the usual background ones. "T. Valko" wrote: Is this a bug in excel or is it something to do with the array not having a labels row? Not a bug. The filter "assumes" the first entry is a column label (header) and brings that entry with it. So, if you had: WordA WordA The filter assumes the first WordA is the column header and the second WordA is a unique entry. So it extracts both. Usually when you don't have a "distinctive" header Excel will "complain" and ask you if your data has a header and how you want to handle it. Biff "Joe" wrote in message ... The equation works well. I noticed that when using DataFilterAdvanced filter after selecting A1 then "Copy to another location" and "Unique records only" ticked It would produce a duplicate of the first two rows if they were actually duplicates as in my example. The list range was $A$1:$A$6 Is this a bug in excel or is it something to do with the array not having a labels row? "T. Valko" wrote: Here's one way. Use the Advanced filter to extract the uniques from column A to another location: Select cell A1 Goto the menu DataFilterAdvanced filter Select: Copy to another location List range: $A$1:$A$4000 Copy to: $F$1 Unique records only OK Enter this array** formula in G1: =IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "") You'll have to copy across to a number of cells that is equal to the maximum count of any single word from column A. In your posted example WordB appeared the most times, 3. So you'll have to copy the formula across to a total of 3 columns. Then copy down to the end of the list in column F. ** array formulas need to be entered using the key combination of CTRL,SHIFT,ENTER (not just ENTER) Biff "Joe" wrote in message ... I have two large columns arrays A1:A4000 and B1:B4000 A: contains words alphabetically sorted and B: their corresponding page numbers. I need to reduce all duplicates of a word to one row with all their number occurences on adjacent column cells of the same row. For example A B C D E F word A 21 word A 21 25 word A 25 to word B 10 15 17 word B 10 word C 3 word B 15 word B 17 word C 3 How do I do this? |
Elimination of Duplicates in an Array and its reduction.
the large array (20X4000) it takes literally many minutes to load...
Hi. Just something different. In Excel 2007, you can probably make a Pivot table with all unique words down, and all page numbers across. (Most likely exceeding 256 Columns) All a macro would have to do is shift all the page numbers over to the left in place of all the blanks. Not tested, but I don't see it taking more than 2-3 seconds. -- HTH :) Dana DeLouis Windows XP & Office 2007 "Joe" wrote in message ... The other thing a forgot to mention is that when pasting your formula into the large array (20X4000) it takes literally many minutes to load and calculate and can hang the program if it runs out of resources (memory?). I have 2GB of physical memory. There are no other applications running at the same time apart from the usual background ones. "T. Valko" wrote: Is this a bug in excel or is it something to do with the array not having a labels row? Not a bug. The filter "assumes" the first entry is a column label (header) and brings that entry with it. So, if you had: WordA WordA The filter assumes the first WordA is the column header and the second WordA is a unique entry. So it extracts both. Usually when you don't have a "distinctive" header Excel will "complain" and ask you if your data has a header and how you want to handle it. Biff "Joe" wrote in message ... The equation works well. I noticed that when using DataFilterAdvanced filter after selecting A1 then "Copy to another location" and "Unique records only" ticked It would produce a duplicate of the first two rows if they were actually duplicates as in my example. The list range was $A$1:$A$6 Is this a bug in excel or is it something to do with the array not having a labels row? "T. Valko" wrote: Here's one way. Use the Advanced filter to extract the uniques from column A to another location: Select cell A1 Goto the menu DataFilterAdvanced filter Select: Copy to another location List range: $A$1:$A$4000 Copy to: $F$1 Unique records only OK Enter this array** formula in G1: =IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "") You'll have to copy across to a number of cells that is equal to the maximum count of any single word from column A. In your posted example WordB appeared the most times, 3. So you'll have to copy the formula across to a total of 3 columns. Then copy down to the end of the list in column F. ** array formulas need to be entered using the key combination of CTRL,SHIFT,ENTER (not just ENTER) Biff "Joe" wrote in message ... I have two large columns arrays A1:A4000 and B1:B4000 A: contains words alphabetically sorted and B: their corresponding page numbers. I need to reduce all duplicates of a word to one row with all their number occurences on adjacent column cells of the same row. For example A B C D E F word A 21 word A 21 25 word A 25 to word B 10 15 17 word B 10 word C 3 word B 15 word B 17 word C 3 How do I do this? |
Elimination of Duplicates in an Array and its reduction.
I have two large columns arrays A1:A4000 and B1:B4000 A: contains words
alphabetically sorted Well, if I had been paying attention...... Being sorted makes a huge difference. We can use a much more efficient non-array formula to do this. Since you have a large number of formulas we should still use the helper column to get the count of each word: Use column G as a helper column to get the counts of the words. This will eliminate the 1000's of COUNTIFs from the array formula: Enter in G1 and copy down as needed: =COUNTIF(A$1:A$4000,F1) You can hide column G if you want. Then enter this formula in H1 (normally entered, not an array): =IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,MATCH($F 1,$A$1:$A$4000,0)+COLUMNS($A:A)-1),"") Copy across then down. This should improve performance "significantly" over the previous method(s) although a pivot table may still be the best solution. Biff "T. Valko" wrote in message ... That's a pretty large range of array formulas. I thought you had 4000 rows that included dupe words so there shouldn't be 4000 unique word entries. Or, is that possible? Anyhow, you can try a couple of different things which are *slightly* better than what you have now. I didn't test this on a dataset the same size as yours so I can't say you'll get *significantly* better performance but these formulas will calculate *slightly* faster. Assume the uniques have been extracted to column F starting in F1. Use column G as a helper column to get the counts of the words. This will eliminate the 1000's of COUNTIFs from the array formula: Enter in G1 and copy down as needed: =COUNTIF(A$1:A$4000,F1) Change the array formula (now entered in H1) to: =IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,SMALL(IF ($A$1:$A$4000=$F1,ROW(B$1:B$4000)),COLUMNS($A:A))) ,"") Now, this is assuming that your actual range of data is really where you said it was, A1:B4000. If the actual range does not start on row 1 then we have to calculate for the offset and doing that actually makes things worse. If this is a one time process, after you get the data extracted you can convert the formulas to constants. Select the range of formulas Goto EditCopy Then EditPaste SpecialValuesOK If this is a repetitive process a pivot table may be a better solution. I can't really help with a pt because I hate them and rarely ever use them. Pivot tables aren't dynamic either. Biff "Joe" wrote in message ... The other thing a forgot to mention is that when pasting your formula into the large array (20X4000) it takes literally many minutes to load and calculate and can hang the program if it runs out of resources (memory?). I have 2GB of physical memory. There are no other applications running at the same time apart from the usual background ones. "T. Valko" wrote: Is this a bug in excel or is it something to do with the array not having a labels row? Not a bug. The filter "assumes" the first entry is a column label (header) and brings that entry with it. So, if you had: WordA WordA The filter assumes the first WordA is the column header and the second WordA is a unique entry. So it extracts both. Usually when you don't have a "distinctive" header Excel will "complain" and ask you if your data has a header and how you want to handle it. Biff "Joe" wrote in message ... The equation works well. I noticed that when using DataFilterAdvanced filter after selecting A1 then "Copy to another location" and "Unique records only" ticked It would produce a duplicate of the first two rows if they were actually duplicates as in my example. The list range was $A$1:$A$6 Is this a bug in excel or is it something to do with the array not having a labels row? "T. Valko" wrote: Here's one way. Use the Advanced filter to extract the uniques from column A to another location: Select cell A1 Goto the menu DataFilterAdvanced filter Select: Copy to another location List range: $A$1:$A$4000 Copy to: $F$1 Unique records only OK Enter this array** formula in G1: =IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "") You'll have to copy across to a number of cells that is equal to the maximum count of any single word from column A. In your posted example WordB appeared the most times, 3. So you'll have to copy the formula across to a total of 3 columns. Then copy down to the end of the list in column F. ** array formulas need to be entered using the key combination of CTRL,SHIFT,ENTER (not just ENTER) Biff "Joe" wrote in message ... I have two large columns arrays A1:A4000 and B1:B4000 A: contains words alphabetically sorted and B: their corresponding page numbers. I need to reduce all duplicates of a word to one row with all their number occurences on adjacent column cells of the same row. For example A B C D E F word A 21 word A 21 25 word A 25 to word B 10 15 17 word B 10 word C 3 word B 15 word B 17 word C 3 How do I do this? |
Elimination of Duplicates in an Array and its reduction.
I thought you had 4000 rows that included dupe words so there shouldn't be
4000 unique word entries. Or, is that possible? After removing duplicates the original array melts down from 3795 to 1961 That means that some duplicates were as high as 100. So the array pans out to 100 columns wide X 1961 rows. This new approach is very much faster. Brilliantly fast compared to the other algorithim. The problem before was that it was hangging the OS and I had to reboot many times. It have stayed away from pivot tables so I'll stay away until forced to use them. Many thanks "T. Valko" wrote: I have two large columns arrays A1:A4000 and B1:B4000 A: contains words alphabetically sorted Well, if I had been paying attention...... Being sorted makes a huge difference. We can use a much more efficient non-array formula to do this. Since you have a large number of formulas we should still use the helper column to get the count of each word: Use column G as a helper column to get the counts of the words. This will eliminate the 1000's of COUNTIFs from the array formula: Enter in G1 and copy down as needed: =COUNTIF(A$1:A$4000,F1) You can hide column G if you want. Then enter this formula in H1 (normally entered, not an array): =IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,MATCH($F 1,$A$1:$A$4000,0)+COLUMNS($A:A)-1),"") Copy across then down. This should improve performance "significantly" over the previous method(s) although a pivot table may still be the best solution. Biff "T. Valko" wrote in message ... That's a pretty large range of array formulas. I thought you had 4000 rows that included dupe words so there shouldn't be 4000 unique word entries. Or, is that possible? Anyhow, you can try a couple of different things which are *slightly* better than what you have now. I didn't test this on a dataset the same size as yours so I can't say you'll get *significantly* better performance but these formulas will calculate *slightly* faster. Assume the uniques have been extracted to column F starting in F1. Use column G as a helper column to get the counts of the words. This will eliminate the 1000's of COUNTIFs from the array formula: Enter in G1 and copy down as needed: =COUNTIF(A$1:A$4000,F1) Change the array formula (now entered in H1) to: =IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,SMALL(IF ($A$1:$A$4000=$F1,ROW(B$1:B$4000)),COLUMNS($A:A))) ,"") Now, this is assuming that your actual range of data is really where you said it was, A1:B4000. If the actual range does not start on row 1 then we have to calculate for the offset and doing that actually makes things worse. If this is a one time process, after you get the data extracted you can convert the formulas to constants. Select the range of formulas Goto EditCopy Then EditPaste SpecialValuesOK If this is a repetitive process a pivot table may be a better solution. I can't really help with a pt because I hate them and rarely ever use them. Pivot tables aren't dynamic either. Biff "Joe" wrote in message ... The other thing a forgot to mention is that when pasting your formula into the large array (20X4000) it takes literally many minutes to load and calculate and can hang the program if it runs out of resources (memory?). I have 2GB of physical memory. There are no other applications running at the same time apart from the usual background ones. "T. Valko" wrote: Is this a bug in excel or is it something to do with the array not having a labels row? Not a bug. The filter "assumes" the first entry is a column label (header) and brings that entry with it. So, if you had: WordA WordA The filter assumes the first WordA is the column header and the second WordA is a unique entry. So it extracts both. Usually when you don't have a "distinctive" header Excel will "complain" and ask you if your data has a header and how you want to handle it. Biff "Joe" wrote in message ... The equation works well. I noticed that when using DataFilterAdvanced filter after selecting A1 then "Copy to another location" and "Unique records only" ticked It would produce a duplicate of the first two rows if they were actually duplicates as in my example. The list range was $A$1:$A$6 Is this a bug in excel or is it something to do with the array not having a labels row? "T. Valko" wrote: Here's one way. Use the Advanced filter to extract the uniques from column A to another location: Select cell A1 Goto the menu DataFilterAdvanced filter Select: Copy to another location List range: $A$1:$A$4000 Copy to: $F$1 Unique records only OK Enter this array** formula in G1: =IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "") You'll have to copy across to a number of cells that is equal to the maximum count of any single word from column A. In your posted example WordB appeared the most times, 3. So you'll have to copy the formula across to a total of 3 columns. Then copy down to the end of the list in column F. ** array formulas need to be entered using the key combination of CTRL,SHIFT,ENTER (not just ENTER) Biff "Joe" wrote in message ... I have two large columns arrays A1:A4000 and B1:B4000 A: contains words alphabetically sorted and B: their corresponding page numbers. I need to reduce all duplicates of a word to one row with all their number occurences on adjacent column cells of the same row. For example A B C D E F word A 21 word A 21 25 word A 25 to word B 10 15 17 word B 10 word C 3 word B 15 word B 17 word C 3 How do I do this? |
Elimination of Duplicates in an Array and its reduction.
So the array pans out to 100 columns wide X 1961 rows.
Wow! That's still a huge number of formulas! I can see why the array formulas were causing so much trouble. The data being sorted made all the difference. As far as pivot tables go, they *can* save you from using up a lot of calculation resources but the thing I *hate* about them is that almost all the pts I've seen look like an unorganized mess. Bleh! Thanks for the feedback! Biff "Joe" wrote in message ... I thought you had 4000 rows that included dupe words so there shouldn't be 4000 unique word entries. Or, is that possible? After removing duplicates the original array melts down from 3795 to 1961 That means that some duplicates were as high as 100. So the array pans out to 100 columns wide X 1961 rows. This new approach is very much faster. Brilliantly fast compared to the other algorithim. The problem before was that it was hangging the OS and I had to reboot many times. It have stayed away from pivot tables so I'll stay away until forced to use them. Many thanks "T. Valko" wrote: I have two large columns arrays A1:A4000 and B1:B4000 A: contains words alphabetically sorted Well, if I had been paying attention...... Being sorted makes a huge difference. We can use a much more efficient non-array formula to do this. Since you have a large number of formulas we should still use the helper column to get the count of each word: Use column G as a helper column to get the counts of the words. This will eliminate the 1000's of COUNTIFs from the array formula: Enter in G1 and copy down as needed: =COUNTIF(A$1:A$4000,F1) You can hide column G if you want. Then enter this formula in H1 (normally entered, not an array): =IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,MATCH($F 1,$A$1:$A$4000,0)+COLUMNS($A:A)-1),"") Copy across then down. This should improve performance "significantly" over the previous method(s) although a pivot table may still be the best solution. Biff "T. Valko" wrote in message ... That's a pretty large range of array formulas. I thought you had 4000 rows that included dupe words so there shouldn't be 4000 unique word entries. Or, is that possible? Anyhow, you can try a couple of different things which are *slightly* better than what you have now. I didn't test this on a dataset the same size as yours so I can't say you'll get *significantly* better performance but these formulas will calculate *slightly* faster. Assume the uniques have been extracted to column F starting in F1. Use column G as a helper column to get the counts of the words. This will eliminate the 1000's of COUNTIFs from the array formula: Enter in G1 and copy down as needed: =COUNTIF(A$1:A$4000,F1) Change the array formula (now entered in H1) to: =IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,SMALL(IF ($A$1:$A$4000=$F1,ROW(B$1:B$4000)),COLUMNS($A:A))) ,"") Now, this is assuming that your actual range of data is really where you said it was, A1:B4000. If the actual range does not start on row 1 then we have to calculate for the offset and doing that actually makes things worse. If this is a one time process, after you get the data extracted you can convert the formulas to constants. Select the range of formulas Goto EditCopy Then EditPaste SpecialValuesOK If this is a repetitive process a pivot table may be a better solution. I can't really help with a pt because I hate them and rarely ever use them. Pivot tables aren't dynamic either. Biff "Joe" wrote in message ... The other thing a forgot to mention is that when pasting your formula into the large array (20X4000) it takes literally many minutes to load and calculate and can hang the program if it runs out of resources (memory?). I have 2GB of physical memory. There are no other applications running at the same time apart from the usual background ones. "T. Valko" wrote: Is this a bug in excel or is it something to do with the array not having a labels row? Not a bug. The filter "assumes" the first entry is a column label (header) and brings that entry with it. So, if you had: WordA WordA The filter assumes the first WordA is the column header and the second WordA is a unique entry. So it extracts both. Usually when you don't have a "distinctive" header Excel will "complain" and ask you if your data has a header and how you want to handle it. Biff "Joe" wrote in message ... The equation works well. I noticed that when using DataFilterAdvanced filter after selecting A1 then "Copy to another location" and "Unique records only" ticked It would produce a duplicate of the first two rows if they were actually duplicates as in my example. The list range was $A$1:$A$6 Is this a bug in excel or is it something to do with the array not having a labels row? "T. Valko" wrote: Here's one way. Use the Advanced filter to extract the uniques from column A to another location: Select cell A1 Goto the menu DataFilterAdvanced filter Select: Copy to another location List range: $A$1:$A$4000 Copy to: $F$1 Unique records only OK Enter this array** formula in G1: =IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "") You'll have to copy across to a number of cells that is equal to the maximum count of any single word from column A. In your posted example WordB appeared the most times, 3. So you'll have to copy the formula across to a total of 3 columns. Then copy down to the end of the list in column F. ** array formulas need to be entered using the key combination of CTRL,SHIFT,ENTER (not just ENTER) Biff "Joe" wrote in message ... I have two large columns arrays A1:A4000 and B1:B4000 A: contains words alphabetically sorted and B: their corresponding page numbers. I need to reduce all duplicates of a word to one row with all their number occurences on adjacent column cells of the same row. For example A B C D E F word A 21 word A 21 25 word A 25 to word B 10 15 17 word B 10 word C 3 word B 15 word B 17 word C 3 How do I do this? |
All times are GMT +1. The time now is 12:07 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com