ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Discussion (Misc queries) (https://www.excelbanter.com/excel-discussion-misc-queries/)
-   -   Elimination of Duplicates in an Array and its reduction. (https://www.excelbanter.com/excel-discussion-misc-queries/139877-elimination-duplicates-array-its-reduction.html)

Joe

Elimination of Duplicates in an Array and its reduction.
 
I have two large columns arrays A1:A4000 and B1:B4000
A: contains words alphabetically sorted and B: their corresponding page
numbers.
I need to reduce all duplicates of a word to one row with all their number
occurences on adjacent column cells
of the same row.
For example
A B C D E F
word A 21 word A 21 25
word A 25 to word B 10 15 17
word B 10 word C 3
word B 15
word B 17
word C 3
How do I do this?


Joe

Elimination of Duplicates in an Array and its reduction.
 
The formating in my example in this post got mangled!
I'll try again.
A B
WordA 21
WordA 25
WordB 10
WordB 15
WordB 17
WordC 3

Down to

C D E F
WordA 21 25
WordB 10 15 17
WordC 3



"Joe" wrote:

I have two large columns arrays A1:A4000 and B1:B4000
A: contains words alphabetically sorted and B: their corresponding page
numbers.
I need to reduce all duplicates of a word to one row with all their number
occurences on adjacent column cells
of the same row.
For example
A B C D E F
word A 21 word A 21 25
word A 25 to word B 10 15 17
word B 10 word C 3
word B 15
word B 17
word C 3
How do I do this?


T. Valko

Elimination of Duplicates in an Array and its reduction.
 
Here's one way.

Use the Advanced filter to extract the uniques from column A to another
location:

Select cell A1
Goto the menu DataFilterAdvanced filter
Select: Copy to another location
List range: $A$1:$A$4000
Copy to: $F$1
Unique records only
OK

Enter this array** formula in G1:

=IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "")

You'll have to copy across to a number of cells that is equal to the maximum
count of any single word from column A. In your posted example WordB
appeared the most times, 3. So you'll have to copy the formula across to a
total of 3 columns. Then copy down to the end of the list in column F.

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER)

Biff

"Joe" wrote in message
...
I have two large columns arrays A1:A4000 and B1:B4000
A: contains words alphabetically sorted and B: their corresponding page
numbers.
I need to reduce all duplicates of a word to one row with all their
number
occurences on adjacent column cells
of the same row.
For example
A B C D E F
word A 21 word A 21 25
word A 25 to word B 10 15 17
word B 10 word C 3
word B 15
word B 17
word C 3
How do I do this?




Joe

Elimination of Duplicates in an Array and its reduction.
 
The equation works well.
I noticed that when using DataFilterAdvanced filter after selecting A1
then "Copy to another location" and "Unique records only" ticked
It would produce a duplicate of the first two rows if they were actually
duplicates as in my example. The list range was $A$1:$A$6
Is this a bug in excel or is it something to do with the array not having a
labels row?

"T. Valko" wrote:

Here's one way.

Use the Advanced filter to extract the uniques from column A to another
location:

Select cell A1
Goto the menu DataFilterAdvanced filter
Select: Copy to another location
List range: $A$1:$A$4000
Copy to: $F$1
Unique records only
OK

Enter this array** formula in G1:

=IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "")

You'll have to copy across to a number of cells that is equal to the maximum
count of any single word from column A. In your posted example WordB
appeared the most times, 3. So you'll have to copy the formula across to a
total of 3 columns. Then copy down to the end of the list in column F.

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER)

Biff

"Joe" wrote in message
...
I have two large columns arrays A1:A4000 and B1:B4000
A: contains words alphabetically sorted and B: their corresponding page
numbers.
I need to reduce all duplicates of a word to one row with all their
number
occurences on adjacent column cells
of the same row.
For example
A B C D E F
word A 21 word A 21 25
word A 25 to word B 10 15 17
word B 10 word C 3
word B 15
word B 17
word C 3
How do I do this?





T. Valko

Elimination of Duplicates in an Array and its reduction.
 
Is this a bug in excel or is it something to do with the
array not having a labels row?


Not a bug. The filter "assumes" the first entry is a column label (header)
and brings that entry with it. So, if you had:

WordA
WordA

The filter assumes the first WordA is the column header and the second WordA
is a unique entry. So it extracts both.

Usually when you don't have a "distinctive" header Excel will "complain" and
ask you if your data has a header and how you want to handle it.

Biff

"Joe" wrote in message
...
The equation works well.
I noticed that when using DataFilterAdvanced filter after selecting A1
then "Copy to another location" and "Unique records only" ticked
It would produce a duplicate of the first two rows if they were actually
duplicates as in my example. The list range was $A$1:$A$6
Is this a bug in excel or is it something to do with the array not having
a
labels row?

"T. Valko" wrote:

Here's one way.

Use the Advanced filter to extract the uniques from column A to another
location:

Select cell A1
Goto the menu DataFilterAdvanced filter
Select: Copy to another location
List range: $A$1:$A$4000
Copy to: $F$1
Unique records only
OK

Enter this array** formula in G1:

=IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "")

You'll have to copy across to a number of cells that is equal to the
maximum
count of any single word from column A. In your posted example WordB
appeared the most times, 3. So you'll have to copy the formula across to
a
total of 3 columns. Then copy down to the end of the list in column F.

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER)

Biff

"Joe" wrote in message
...
I have two large columns arrays A1:A4000 and B1:B4000
A: contains words alphabetically sorted and B: their corresponding
page
numbers.
I need to reduce all duplicates of a word to one row with all their
number
occurences on adjacent column cells
of the same row.
For example
A B C D E F
word A 21 word A 21 25
word A 25 to word B 10 15 17
word B 10 word C 3
word B 15
word B 17
word C 3
How do I do this?







Joe

Elimination of Duplicates in an Array and its reduction.
 
The other thing a forgot to mention is that when pasting your formula into
the large array (20X4000) it takes literally many minutes to load and
calculate and can hang the program if it runs out of resources (memory?). I
have 2GB of physical memory. There are no other applications running at the
same time apart from the usual background ones.

"T. Valko" wrote:

Is this a bug in excel or is it something to do with the
array not having a labels row?


Not a bug. The filter "assumes" the first entry is a column label (header)
and brings that entry with it. So, if you had:

WordA
WordA

The filter assumes the first WordA is the column header and the second WordA
is a unique entry. So it extracts both.

Usually when you don't have a "distinctive" header Excel will "complain" and
ask you if your data has a header and how you want to handle it.

Biff

"Joe" wrote in message
...
The equation works well.
I noticed that when using DataFilterAdvanced filter after selecting A1
then "Copy to another location" and "Unique records only" ticked
It would produce a duplicate of the first two rows if they were actually
duplicates as in my example. The list range was $A$1:$A$6
Is this a bug in excel or is it something to do with the array not having
a
labels row?

"T. Valko" wrote:

Here's one way.

Use the Advanced filter to extract the uniques from column A to another
location:

Select cell A1
Goto the menu DataFilterAdvanced filter
Select: Copy to another location
List range: $A$1:$A$4000
Copy to: $F$1
Unique records only
OK

Enter this array** formula in G1:

=IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "")

You'll have to copy across to a number of cells that is equal to the
maximum
count of any single word from column A. In your posted example WordB
appeared the most times, 3. So you'll have to copy the formula across to
a
total of 3 columns. Then copy down to the end of the list in column F.

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER)

Biff

"Joe" wrote in message
...
I have two large columns arrays A1:A4000 and B1:B4000
A: contains words alphabetically sorted and B: their corresponding
page
numbers.
I need to reduce all duplicates of a word to one row with all their
number
occurences on adjacent column cells
of the same row.
For example
A B C D E F
word A 21 word A 21 25
word A 25 to word B 10 15 17
word B 10 word C 3
word B 15
word B 17
word C 3
How do I do this?








T. Valko

Elimination of Duplicates in an Array and its reduction.
 
That's a pretty large range of array formulas.

I thought you had 4000 rows that included dupe words so there shouldn't be
4000 unique word entries. Or, is that possible?

Anyhow, you can try a couple of different things which are *slightly* better
than what you have now. I didn't test this on a dataset the same size as
yours so I can't say you'll get *significantly* better performance but these
formulas will calculate *slightly* faster.

Assume the uniques have been extracted to column F starting in F1.

Use column G as a helper column to get the counts of the words. This will
eliminate the 1000's of COUNTIFs from the array formula:

Enter in G1 and copy down as needed:

=COUNTIF(A$1:A$4000,F1)

Change the array formula (now entered in H1) to:

=IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,SMALL(IF ($A$1:$A$4000=$F1,ROW(B$1:B$4000)),COLUMNS($A:A))) ,"")

Now, this is assuming that your actual range of data is really where you
said it was, A1:B4000. If the actual range does not start on row 1 then we
have to calculate for the offset and doing that actually makes things worse.

If this is a one time process, after you get the data extracted you can
convert the formulas to constants.

Select the range of formulas
Goto EditCopy
Then EditPaste SpecialValuesOK

If this is a repetitive process a pivot table may be a better solution. I
can't really help with a pt because I hate them and rarely ever use them.
Pivot tables aren't dynamic either.

Biff

"Joe" wrote in message
...
The other thing a forgot to mention is that when pasting your formula into
the large array (20X4000) it takes literally many minutes to load and
calculate and can hang the program if it runs out of resources (memory?).
I
have 2GB of physical memory. There are no other applications running at
the
same time apart from the usual background ones.

"T. Valko" wrote:

Is this a bug in excel or is it something to do with the
array not having a labels row?


Not a bug. The filter "assumes" the first entry is a column label
(header)
and brings that entry with it. So, if you had:

WordA
WordA

The filter assumes the first WordA is the column header and the second
WordA
is a unique entry. So it extracts both.

Usually when you don't have a "distinctive" header Excel will "complain"
and
ask you if your data has a header and how you want to handle it.

Biff

"Joe" wrote in message
...
The equation works well.
I noticed that when using DataFilterAdvanced filter after selecting
A1
then "Copy to another location" and "Unique records only" ticked
It would produce a duplicate of the first two rows if they were
actually
duplicates as in my example. The list range was $A$1:$A$6
Is this a bug in excel or is it something to do with the array not
having
a
labels row?

"T. Valko" wrote:

Here's one way.

Use the Advanced filter to extract the uniques from column A to
another
location:

Select cell A1
Goto the menu DataFilterAdvanced filter
Select: Copy to another location
List range: $A$1:$A$4000
Copy to: $F$1
Unique records only
OK

Enter this array** formula in G1:

=IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "")

You'll have to copy across to a number of cells that is equal to the
maximum
count of any single word from column A. In your posted example WordB
appeared the most times, 3. So you'll have to copy the formula across
to
a
total of 3 columns. Then copy down to the end of the list in column F.

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER)

Biff

"Joe" wrote in message
...
I have two large columns arrays A1:A4000 and B1:B4000
A: contains words alphabetically sorted and B: their corresponding
page
numbers.
I need to reduce all duplicates of a word to one row with all their
number
occurences on adjacent column cells
of the same row.
For example
A B C D E F
word A 21 word A 21 25
word A 25 to word B 10 15 17
word B 10 word C 3
word B 15
word B 17
word C 3
How do I do this?










Dana DeLouis

Elimination of Duplicates in an Array and its reduction.
 
the large array (20X4000) it takes literally many minutes to load...

Hi. Just something different.
In Excel 2007, you can probably make a Pivot table with all unique words
down, and all page numbers across.
(Most likely exceeding 256 Columns)
All a macro would have to do is shift all the page numbers over to the left
in place of all the blanks.
Not tested, but I don't see it taking more than 2-3 seconds.

--
HTH :)
Dana DeLouis
Windows XP & Office 2007


"Joe" wrote in message
...
The other thing a forgot to mention is that when pasting your formula into
the large array (20X4000) it takes literally many minutes to load and
calculate and can hang the program if it runs out of resources (memory?).
I
have 2GB of physical memory. There are no other applications running at
the
same time apart from the usual background ones.

"T. Valko" wrote:

Is this a bug in excel or is it something to do with the
array not having a labels row?


Not a bug. The filter "assumes" the first entry is a column label
(header)
and brings that entry with it. So, if you had:

WordA
WordA

The filter assumes the first WordA is the column header and the second
WordA
is a unique entry. So it extracts both.

Usually when you don't have a "distinctive" header Excel will "complain"
and
ask you if your data has a header and how you want to handle it.

Biff

"Joe" wrote in message
...
The equation works well.
I noticed that when using DataFilterAdvanced filter after selecting
A1
then "Copy to another location" and "Unique records only" ticked
It would produce a duplicate of the first two rows if they were
actually
duplicates as in my example. The list range was $A$1:$A$6
Is this a bug in excel or is it something to do with the array not
having
a
labels row?

"T. Valko" wrote:

Here's one way.

Use the Advanced filter to extract the uniques from column A to
another
location:

Select cell A1
Goto the menu DataFilterAdvanced filter
Select: Copy to another location
List range: $A$1:$A$4000
Copy to: $F$1
Unique records only
OK

Enter this array** formula in G1:

=IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "")

You'll have to copy across to a number of cells that is equal to the
maximum
count of any single word from column A. In your posted example WordB
appeared the most times, 3. So you'll have to copy the formula across
to
a
total of 3 columns. Then copy down to the end of the list in column F.

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER)

Biff

"Joe" wrote in message
...
I have two large columns arrays A1:A4000 and B1:B4000
A: contains words alphabetically sorted and B: their corresponding
page
numbers.
I need to reduce all duplicates of a word to one row with all their
number
occurences on adjacent column cells
of the same row.
For example
A B C D E F
word A 21 word A 21 25
word A 25 to word B 10 15 17
word B 10 word C 3
word B 15
word B 17
word C 3
How do I do this?










T. Valko

Elimination of Duplicates in an Array and its reduction.
 
I have two large columns arrays A1:A4000 and B1:B4000 A: contains words
alphabetically sorted


Well, if I had been paying attention......

Being sorted makes a huge difference. We can use a much more efficient
non-array formula to do this.

Since you have a large number of formulas we should still use the helper
column to get the count of each word:

Use column G as a helper column to get the counts of the words. This will
eliminate the 1000's of COUNTIFs from the array formula:

Enter in G1 and copy down as needed:

=COUNTIF(A$1:A$4000,F1)


You can hide column G if you want.

Then enter this formula in H1 (normally entered, not an array):

=IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,MATCH($F 1,$A$1:$A$4000,0)+COLUMNS($A:A)-1),"")

Copy across then down.

This should improve performance "significantly" over the previous method(s)
although a pivot table may still be the best solution.

Biff

"T. Valko" wrote in message
...
That's a pretty large range of array formulas.

I thought you had 4000 rows that included dupe words so there shouldn't be
4000 unique word entries. Or, is that possible?

Anyhow, you can try a couple of different things which are *slightly*
better than what you have now. I didn't test this on a dataset the same
size as yours so I can't say you'll get *significantly* better performance
but these formulas will calculate *slightly* faster.

Assume the uniques have been extracted to column F starting in F1.

Use column G as a helper column to get the counts of the words. This will
eliminate the 1000's of COUNTIFs from the array formula:

Enter in G1 and copy down as needed:

=COUNTIF(A$1:A$4000,F1)

Change the array formula (now entered in H1) to:

=IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,SMALL(IF ($A$1:$A$4000=$F1,ROW(B$1:B$4000)),COLUMNS($A:A))) ,"")

Now, this is assuming that your actual range of data is really where you
said it was, A1:B4000. If the actual range does not start on row 1 then we
have to calculate for the offset and doing that actually makes things
worse.

If this is a one time process, after you get the data extracted you can
convert the formulas to constants.

Select the range of formulas
Goto EditCopy
Then EditPaste SpecialValuesOK

If this is a repetitive process a pivot table may be a better solution. I
can't really help with a pt because I hate them and rarely ever use them.
Pivot tables aren't dynamic either.

Biff

"Joe" wrote in message
...
The other thing a forgot to mention is that when pasting your formula
into
the large array (20X4000) it takes literally many minutes to load and
calculate and can hang the program if it runs out of resources (memory?).
I
have 2GB of physical memory. There are no other applications running at
the
same time apart from the usual background ones.

"T. Valko" wrote:

Is this a bug in excel or is it something to do with the
array not having a labels row?

Not a bug. The filter "assumes" the first entry is a column label
(header)
and brings that entry with it. So, if you had:

WordA
WordA

The filter assumes the first WordA is the column header and the second
WordA
is a unique entry. So it extracts both.

Usually when you don't have a "distinctive" header Excel will "complain"
and
ask you if your data has a header and how you want to handle it.

Biff

"Joe" wrote in message
...
The equation works well.
I noticed that when using DataFilterAdvanced filter after selecting
A1
then "Copy to another location" and "Unique records only" ticked
It would produce a duplicate of the first two rows if they were
actually
duplicates as in my example. The list range was $A$1:$A$6
Is this a bug in excel or is it something to do with the array not
having
a
labels row?

"T. Valko" wrote:

Here's one way.

Use the Advanced filter to extract the uniques from column A to
another
location:

Select cell A1
Goto the menu DataFilterAdvanced filter
Select: Copy to another location
List range: $A$1:$A$4000
Copy to: $F$1
Unique records only
OK

Enter this array** formula in G1:

=IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "")

You'll have to copy across to a number of cells that is equal to the
maximum
count of any single word from column A. In your posted example WordB
appeared the most times, 3. So you'll have to copy the formula across
to
a
total of 3 columns. Then copy down to the end of the list in column
F.

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER)

Biff

"Joe" wrote in message
...
I have two large columns arrays A1:A4000 and B1:B4000
A: contains words alphabetically sorted and B: their corresponding
page
numbers.
I need to reduce all duplicates of a word to one row with all
their
number
occurences on adjacent column cells
of the same row.
For example
A B C D E F
word A 21 word A 21 25
word A 25 to word B 10 15 17
word B 10 word C 3
word B 15
word B 17
word C 3
How do I do this?












Joe

Elimination of Duplicates in an Array and its reduction.
 
I thought you had 4000 rows that included dupe words so there shouldn't be
4000 unique word entries. Or, is that possible?


After removing duplicates the original array melts down from 3795 to 1961
That means that some duplicates were as high as 100. So the array pans out
to 100 columns wide X 1961 rows.

This new approach is very much faster. Brilliantly fast compared to the
other algorithim. The problem before was that it was hangging the OS and I
had to reboot many times. It have stayed away from pivot tables so I'll stay
away until forced to use them.
Many thanks

"T. Valko" wrote:

I have two large columns arrays A1:A4000 and B1:B4000 A: contains words
alphabetically sorted


Well, if I had been paying attention......

Being sorted makes a huge difference. We can use a much more efficient
non-array formula to do this.

Since you have a large number of formulas we should still use the helper
column to get the count of each word:

Use column G as a helper column to get the counts of the words. This will
eliminate the 1000's of COUNTIFs from the array formula:

Enter in G1 and copy down as needed:

=COUNTIF(A$1:A$4000,F1)


You can hide column G if you want.

Then enter this formula in H1 (normally entered, not an array):

=IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,MATCH($F 1,$A$1:$A$4000,0)+COLUMNS($A:A)-1),"")

Copy across then down.

This should improve performance "significantly" over the previous method(s)
although a pivot table may still be the best solution.

Biff

"T. Valko" wrote in message
...
That's a pretty large range of array formulas.

I thought you had 4000 rows that included dupe words so there shouldn't be
4000 unique word entries. Or, is that possible?

Anyhow, you can try a couple of different things which are *slightly*
better than what you have now. I didn't test this on a dataset the same
size as yours so I can't say you'll get *significantly* better performance
but these formulas will calculate *slightly* faster.

Assume the uniques have been extracted to column F starting in F1.

Use column G as a helper column to get the counts of the words. This will
eliminate the 1000's of COUNTIFs from the array formula:

Enter in G1 and copy down as needed:

=COUNTIF(A$1:A$4000,F1)

Change the array formula (now entered in H1) to:

=IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,SMALL(IF ($A$1:$A$4000=$F1,ROW(B$1:B$4000)),COLUMNS($A:A))) ,"")

Now, this is assuming that your actual range of data is really where you
said it was, A1:B4000. If the actual range does not start on row 1 then we
have to calculate for the offset and doing that actually makes things
worse.

If this is a one time process, after you get the data extracted you can
convert the formulas to constants.

Select the range of formulas
Goto EditCopy
Then EditPaste SpecialValuesOK

If this is a repetitive process a pivot table may be a better solution. I
can't really help with a pt because I hate them and rarely ever use them.
Pivot tables aren't dynamic either.

Biff

"Joe" wrote in message
...
The other thing a forgot to mention is that when pasting your formula
into
the large array (20X4000) it takes literally many minutes to load and
calculate and can hang the program if it runs out of resources (memory?).
I
have 2GB of physical memory. There are no other applications running at
the
same time apart from the usual background ones.

"T. Valko" wrote:

Is this a bug in excel or is it something to do with the
array not having a labels row?

Not a bug. The filter "assumes" the first entry is a column label
(header)
and brings that entry with it. So, if you had:

WordA
WordA

The filter assumes the first WordA is the column header and the second
WordA
is a unique entry. So it extracts both.

Usually when you don't have a "distinctive" header Excel will "complain"
and
ask you if your data has a header and how you want to handle it.

Biff

"Joe" wrote in message
...
The equation works well.
I noticed that when using DataFilterAdvanced filter after selecting
A1
then "Copy to another location" and "Unique records only" ticked
It would produce a duplicate of the first two rows if they were
actually
duplicates as in my example. The list range was $A$1:$A$6
Is this a bug in excel or is it something to do with the array not
having
a
labels row?

"T. Valko" wrote:

Here's one way.

Use the Advanced filter to extract the uniques from column A to
another
location:

Select cell A1
Goto the menu DataFilterAdvanced filter
Select: Copy to another location
List range: $A$1:$A$4000
Copy to: $F$1
Unique records only
OK

Enter this array** formula in G1:

=IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "")

You'll have to copy across to a number of cells that is equal to the
maximum
count of any single word from column A. In your posted example WordB
appeared the most times, 3. So you'll have to copy the formula across
to
a
total of 3 columns. Then copy down to the end of the list in column
F.

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER)

Biff

"Joe" wrote in message
...
I have two large columns arrays A1:A4000 and B1:B4000
A: contains words alphabetically sorted and B: their corresponding
page
numbers.
I need to reduce all duplicates of a word to one row with all
their
number
occurences on adjacent column cells
of the same row.
For example
A B C D E F
word A 21 word A 21 25
word A 25 to word B 10 15 17
word B 10 word C 3
word B 15
word B 17
word C 3
How do I do this?













T. Valko

Elimination of Duplicates in an Array and its reduction.
 
So the array pans out to 100 columns wide X 1961 rows.

Wow! That's still a huge number of formulas!

I can see why the array formulas were causing so much trouble. The data
being sorted made all the difference.

As far as pivot tables go, they *can* save you from using up a lot of
calculation resources but the thing I *hate* about them is that almost all
the pts I've seen look like an unorganized mess. Bleh!

Thanks for the feedback!

Biff

"Joe" wrote in message
...
I thought you had 4000 rows that included dupe words so there shouldn't
be
4000 unique word entries. Or, is that possible?


After removing duplicates the original array melts down from 3795 to 1961
That means that some duplicates were as high as 100. So the array pans out
to 100 columns wide X 1961 rows.

This new approach is very much faster. Brilliantly fast compared to the
other algorithim. The problem before was that it was hangging the OS and I
had to reboot many times. It have stayed away from pivot tables so I'll
stay
away until forced to use them.
Many thanks

"T. Valko" wrote:

I have two large columns arrays A1:A4000 and B1:B4000 A: contains
words
alphabetically sorted


Well, if I had been paying attention......

Being sorted makes a huge difference. We can use a much more efficient
non-array formula to do this.

Since you have a large number of formulas we should still use the helper
column to get the count of each word:

Use column G as a helper column to get the counts of the words. This
will
eliminate the 1000's of COUNTIFs from the array formula:

Enter in G1 and copy down as needed:

=COUNTIF(A$1:A$4000,F1)


You can hide column G if you want.

Then enter this formula in H1 (normally entered, not an array):

=IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,MATCH($F 1,$A$1:$A$4000,0)+COLUMNS($A:A)-1),"")

Copy across then down.

This should improve performance "significantly" over the previous
method(s)
although a pivot table may still be the best solution.

Biff

"T. Valko" wrote in message
...
That's a pretty large range of array formulas.

I thought you had 4000 rows that included dupe words so there shouldn't
be
4000 unique word entries. Or, is that possible?

Anyhow, you can try a couple of different things which are *slightly*
better than what you have now. I didn't test this on a dataset the same
size as yours so I can't say you'll get *significantly* better
performance
but these formulas will calculate *slightly* faster.

Assume the uniques have been extracted to column F starting in F1.

Use column G as a helper column to get the counts of the words. This
will
eliminate the 1000's of COUNTIFs from the array formula:

Enter in G1 and copy down as needed:

=COUNTIF(A$1:A$4000,F1)

Change the array formula (now entered in H1) to:

=IF(COLUMNS($A:A)<=$G1,INDEX($B$1:$B$4000,SMALL(IF ($A$1:$A$4000=$F1,ROW(B$1:B$4000)),COLUMNS($A:A))) ,"")

Now, this is assuming that your actual range of data is really where
you
said it was, A1:B4000. If the actual range does not start on row 1 then
we
have to calculate for the offset and doing that actually makes things
worse.

If this is a one time process, after you get the data extracted you can
convert the formulas to constants.

Select the range of formulas
Goto EditCopy
Then EditPaste SpecialValuesOK

If this is a repetitive process a pivot table may be a better solution.
I
can't really help with a pt because I hate them and rarely ever use
them.
Pivot tables aren't dynamic either.

Biff

"Joe" wrote in message
...
The other thing a forgot to mention is that when pasting your formula
into
the large array (20X4000) it takes literally many minutes to load and
calculate and can hang the program if it runs out of resources
(memory?).
I
have 2GB of physical memory. There are no other applications running
at
the
same time apart from the usual background ones.

"T. Valko" wrote:

Is this a bug in excel or is it something to do with the
array not having a labels row?

Not a bug. The filter "assumes" the first entry is a column label
(header)
and brings that entry with it. So, if you had:

WordA
WordA

The filter assumes the first WordA is the column header and the
second
WordA
is a unique entry. So it extracts both.

Usually when you don't have a "distinctive" header Excel will
"complain"
and
ask you if your data has a header and how you want to handle it.

Biff

"Joe" wrote in message
...
The equation works well.
I noticed that when using DataFilterAdvanced filter after
selecting
A1
then "Copy to another location" and "Unique records only" ticked
It would produce a duplicate of the first two rows if they were
actually
duplicates as in my example. The list range was $A$1:$A$6
Is this a bug in excel or is it something to do with the array not
having
a
labels row?

"T. Valko" wrote:

Here's one way.

Use the Advanced filter to extract the uniques from column A to
another
location:

Select cell A1
Goto the menu DataFilterAdvanced filter
Select: Copy to another location
List range: $A$1:$A$4000
Copy to: $F$1
Unique records only
OK

Enter this array** formula in G1:

=IF(COLUMNS($A:A)<=COUNTIF($A$1:$A$4000,$F1),SMALL (IF($A$1:$A$4000=$F1,$B$1:$B$4000),COLUMNS($A:A)), "")

You'll have to copy across to a number of cells that is equal to
the
maximum
count of any single word from column A. In your posted example
WordB
appeared the most times, 3. So you'll have to copy the formula
across
to
a
total of 3 columns. Then copy down to the end of the list in
column
F.

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER)

Biff

"Joe" wrote in message
...
I have two large columns arrays A1:A4000 and B1:B4000
A: contains words alphabetically sorted and B: their
corresponding
page
numbers.
I need to reduce all duplicates of a word to one row with all
their
number
occurences on adjacent column cells
of the same row.
For example
A B C D E F
word A 21 word A 21 25
word A 25 to word B 10 15 17
word B 10 word C 3
word B 15
word B 17
word C 3
How do I do this?
















All times are GMT +1. The time now is 12:07 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com