Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 3
Default Counting unique numbers

Hello everyone,

I have a large set of data that has in one column, the company id
which repeats on consecutive rows for each year in the dataset. Then
in the next column, there's the ID of each segment within the company
(a number). I need to count for each company code, how many segments
(unique numbers) are there. The data for a small sample looks like
this:

COMP-ID SEG-ID
1001 1
1001 1
1001 1
1003 1
1003 1
1003 1
1003 1
1004 3
1004 4
1004 5
1004 3
1004 4
1004 5
1004 3

in this case I would need to generate a list like this

1001 1
1003 1
1004 3

which means that company 1001 has 1 segment, 1003 has 1 segment and
company 1004 has 3 segments (segments 3, 4 and 5).

I hope anybody can shed some light on how to tackle this one because I
haven't been able to figure something out. Thanks in advance,

Juan Marin
  #2   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 15,768
Default Counting unique numbers

Try this array formula** :

Data in the range A2:B15
Assuming no empty cells in column B

D2:Dn = 1001, 1003, 1004 etc

Array entered** in E2 and copied down as needed:

=SUM(IF(FREQUENCY(IF(A$2:A$15=D2,B$2:B$15),B$2:B$1 5),1))

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER). Hold down both the CTRL key and the SHIFT
key then hit ENTER.

--
Biff
Microsoft Excel MVP


"Juan Marin" wrote in message
...
Hello everyone,

I have a large set of data that has in one column, the company id
which repeats on consecutive rows for each year in the dataset. Then
in the next column, there's the ID of each segment within the company
(a number). I need to count for each company code, how many segments
(unique numbers) are there. The data for a small sample looks like
this:

COMP-ID SEG-ID
1001 1
1001 1
1001 1
1003 1
1003 1
1003 1
1003 1
1004 3
1004 4
1004 5
1004 3
1004 4
1004 5
1004 3

in this case I would need to generate a list like this

1001 1
1003 1
1004 3

which means that company 1001 has 1 segment, 1003 has 1 segment and
company 1004 has 3 segments (segments 3, 4 and 5).

I hope anybody can shed some light on how to tackle this one because I
haven't been able to figure something out. Thanks in advance,

Juan Marin



  #3   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 5,651
Default Counting unique numbers

On Mon, 10 Aug 2009 21:07:01 -0700 (PDT), Juan Marin
wrote:

Hello everyone,

I have a large set of data that has in one column, the company id
which repeats on consecutive rows for each year in the dataset. Then
in the next column, there's the ID of each segment within the company
(a number). I need to count for each company code, how many segments
(unique numbers) are there. The data for a small sample looks like
this:

COMP-ID SEG-ID
1001 1
1001 1
1001 1
1003 1
1003 1
1003 1
1003 1
1004 3
1004 4
1004 5
1004 3
1004 4
1004 5
1004 3

in this case I would need to generate a list like this

1001 1
1003 1
1004 3

which means that company 1001 has 1 segment, 1003 has 1 segment and
company 1004 has 3 segments (segments 3, 4 and 5).

I hope anybody can shed some light on how to tackle this one because I
haven't been able to figure something out. Thanks in advance,

Juan Marin


Add a column to your data.

If your has the column labels in A1:B1, and the data starts in A2, then

C1: Unique Segs
C2: =IF(SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))1,0,1)

Fill down as far as needed.

Then Insert/Pivot Table.

COMP-ID to Row area
Unique Segs to Values or Data area

Make sure your "Unique Segs" value field setting is set to do a SUM
--ron
  #4   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 3
Default Counting unique numbers

Thank you so much Ron and Biff, both solutions work pretty well.
Nevertheless I'm encountering one more difficulty, my data set is
really large (+400,000 rows) so pasting down any of these formulas is
not practical because as I paste them down, they stay fixed to the
first cell of the range. However, I know that the data is ordered, and
that in any case, if I were able to look just 100 rows forward and 100
rows ahead, that would be enough. I've been trying to fix both
formulas with ADDRESS() in such a way that I could vary the initial
and ending rows in such way, but I haven't been able to do so
effectively. Any thoughts? Thank you again.

JM
  #5   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 15,768
Default Counting unique numbers

my data set is really large (+400,000 rows)

Yeah, that could slow things down when you're counting uniques on that size
range!

I know that the data is ordered


Despite the length of this formula it's much better! It narrows down the
range to only those cells that are of the same ID.

Array entered**

=SUM(IF(FREQUENCY(IF(INDEX(A:A,MATCH(D2,A:A,0)):IN DEX(A:A,MATCH(D2,A:A,0)+COUNTIF(A:A,D2)-1)=D2,INDEX(B:B,MATCH(D2,A:A,0)):INDEX(B:B,MATCH(D 2,A:A,0)+COUNTIF(A:A,D2)-1)),INDEX(B:B,MATCH(D2,A:A,0)):INDEX(B:B,MATCH(D2, A:A,0)+COUNTIF(A:A,D2)-1)),1))

If you can download and install the free Morefunc add-in from:

http://xcell05.free.fr/morefunc/english/index.htm

Alternate download site:

http://www.download.com/Morefunc/300...-10423159.html

Then you can use this slightly shorter array entered** version:

=SUM(IF(FREQUENCY(IF(INDEX(A:A,MATCH(D2,A:A,0)):IN DEX(A:A,MATCH(D2,A:A,0)+COUNTIF(A:A,D2)-1)=D2,SETV(INDEX(B:B,MATCH(D2,A:A,0)):INDEX(B:B,MA TCH(D2,A:A,0)+COUNTIF(A:A,D2)-1))),GETV()),1))** array formulas need to be entered using the key combination ofCTRL,SHIFT,ENTER (not just ENTER). Hold down both the CTRL key and the SHIFTkey then hit ENTER.--BiffMicrosoft Excel MVP"Juan Marin" wrote in ... Thank you so much Ron and Biff, both solutions work pretty well. Nevertheless I'm encountering one more difficulty, my data set is really large (+400,000 rows) so pasting down any of these formulas is not practical because as I paste them down, they stay fixed to the first cell of the range. However, I know that the data is ordered, and that in any case, if I were able to look just 100 rows forward and 100 rows ahead, that would be enough. I've been trying to fix both formulas with ADDRESS() in such a way that I could vary the initial and ending rows in such way, but I haven't been able to do so effectively. Any thoughts? Thank you again. JM



  #6   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 806
Default Counting unique numbers

Hello Juan,

I would not use worksheet functions to calculate this. See here,
please:
http://sulprobil.com/html/count_unique.html

I would suggest to use my UDF Pfreq:
http://sulprobil.com/html/pfreq.html
Use the =Pfreq(Pfreq(A1:A400000,B1:B400000)) approach if its working.

Try my UDF Pstat if your data amount is too large for Pfreq.
http://sulprobil.com/html/pstat.html

Regards,
Bernd

  #7   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 272
Default Counting unique numbers

With this amount of data, perhaps you could use the
remove duplicates Excel 07 command or advanced filter for
unique values and then use a PivotTable or formula?

Otherwise try the Import Data command
(under Data Import External Data in XL02 menus)

Select Excel files, locate the current file and click Open.
Choose the Data range eg Sheet1 (or a named range) then OK.
Now Choose Edit Query, Command Type: SQL and Command Text:

SELECT [COMP-ID],COUNT(*) AS [Segs] FROM
(SELECT DISTINCT * FROM [Sheet1$]) GROUP BY [COMP-ID]

Click OK, select the output cell and click OK again.

For 65536 rows and around 10000 COMP-IDs this takes only a
second to run in tests, other solutions froze the application.


"Juan Marin" wrote:

Hello everyone,

I have a large set of data that has in one column, the company id
which repeats on consecutive rows for each year in the dataset. Then
in the next column, there's the ID of each segment within the company
(a number). I need to count for each company code, how many segments
(unique numbers) are there. The data for a small sample looks like
this:

COMP-ID SEG-ID
1001 1
1001 1
1001 1
1003 1
1003 1
1003 1
1003 1
1004 3
1004 4
1004 5
1004 3
1004 4
1004 5
1004 3

in this case I would need to generate a list like this

1001 1
1003 1
1004 3

which means that company 1001 has 1 segment, 1003 has 1 segment and
company 1004 has 3 segments (segments 3, 4 and 5).

I hope anybody can shed some light on how to tackle this one because I
haven't been able to figure something out. Thanks in advance,

Juan Marin

  #8   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 806
Default Counting unique numbers

Hello Lori,

400,000 records are admittedly asking for a database approach.

I optimized the memory strategy of my UDF Pfreq and its working now
for 65,536 rows with 10,000 different items. Runtime on my dualcore
proc: about 6 seconds.

Regards,
Bernd
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
counting and unique kr Excel Discussion (Misc queries) 0 January 29th 09 01:54 AM
Counting Unique tonyalt3 Excel Worksheet Functions 1 April 16th 08 11:32 PM
Unique counting Michael Excel Discussion (Misc queries) 19 November 27th 06 09:28 AM
Counting Unique Part Numbers In A Range BigH Excel Worksheet Functions 2 December 9th 05 07:09 PM
counting unique numbers in filtered data deb Excel Worksheet Functions 1 September 22nd 05 09:41 PM


All times are GMT +1. The time now is 01:30 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"