Home |
Search |
Today's Posts |
#1
![]() |
|||
|
|||
![]()
I have a list of builders where I want to extract all the unique values.
Using =IF(COUNTIF($A$1:A1,A1)=1,A!,"") works fine to get the first set of unique values, but I need to extract it a bit further. The builder names may be Ryan 50, Ryan 60 or just Ryan. I need to combine those values to come up with the final list of unique builders that just says Ryan. I'm not sure if I can strip off anything from the end as some are Ryan - Greenbrier. I thought maybe removing everything to the right of a blank space, but those entries have 2 blanks. Any help would be greatly appreciated! Thanks! |
#2
![]() |
|||
|
|||
![]()
Try using the filter (data filter) and then in the drop down list, select
"custom" and enter the custom feature you're looking for eg. contains "ryan". "Karl Burrows" wrote: I have a list of builders where I want to extract all the unique values. Using =IF(COUNTIF($A$1:A1,A1)=1,A!,"") works fine to get the first set of unique values, but I need to extract it a bit further. The builder names may be Ryan 50, Ryan 60 or just Ryan. I need to combine those values to come up with the final list of unique builders that just says Ryan. I'm not sure if I can strip off anything from the end as some are Ryan - Greenbrier. I thought maybe removing everything to the right of a blank space, but those entries have 2 blanks. Any help would be greatly appreciated! Thanks! |
#3
![]() |
|||
|
|||
![]()
It can't use a filter, it has to be derived from a formula. It drives other
reporting. In addition, there are other builders, so they would have to be filtered as well. I almost need something like reverse concatenation. "James Hamilton" wrote in message ... Try using the filter (data filter) and then in the drop down list, select "custom" and enter the custom feature you're looking for eg. contains "ryan". "Karl Burrows" wrote: I have a list of builders where I want to extract all the unique values. Using =IF(COUNTIF($A$1:A1,A1)=1,A!,"") works fine to get the first set of unique values, but I need to extract it a bit further. The builder names may be Ryan 50, Ryan 60 or just Ryan. I need to combine those values to come up with the final list of unique builders that just says Ryan. I'm not sure if I can strip off anything from the end as some are Ryan - Greenbrier. I thought maybe removing everything to the right of a blank space, but those entries have 2 blanks. Any help would be greatly appreciated! Thanks! |
#4
![]() |
|||
|
|||
![]()
what do you mean by reverse concatenation?
ps - you should be storing DATA in a DATABASE and not in excel. -Aaron |
#5
![]() |
|||
|
|||
![]()
It is in a database, but am using Excel to pull some formatted reports.
Here is a better example, for multiple builders: Ryan Ryan Townhomes Ryan 60' Mulvaney - Greenbrier Mulvaney - 80 KB Home KB Home 70' I need to extract the unique values to give me: Ryan Mulvaney KB Home What I am doing is about impossible in Access, as it is pulling lot data into a formatted report spread over a 6 year rolling period. I would still have the same issue in Access as well if I were to query the data for these values. Thanks! wrote in message oups.com... what do you mean by reverse concatenation? ps - you should be storing DATA in a DATABASE and not in excel. -Aaron |
#6
![]() |
|||
|
|||
![]()
I note that there is a space between the unique information eg. "ryan" and
the other data on the end eg "townhomes"....so try this: 1. highlight the column with the data in it 2. Go to DATA TEXT TO COLUMNS, then make sure "delimited" is selected and hit NEXT, then make sure that "SPACE" is checked, then hit NEXT, then FINISH. Can you let me know if this works? "Karl Burrows" wrote: It is in a database, but am using Excel to pull some formatted reports. Here is a better example, for multiple builders: Ryan Ryan Townhomes Ryan 60' Mulvaney - Greenbrier Mulvaney - 80 KB Home KB Home 70' I need to extract the unique values to give me: Ryan Mulvaney KB Home What I am doing is about impossible in Access, as it is pulling lot data into a formatted report spread over a 6 year rolling period. I would still have the same issue in Access as well if I were to query the data for these values. Thanks! wrote in message oups.com... what do you mean by reverse concatenation? ps - you should be storing DATA in a DATABASE and not in excel. -Aaron |
#7
![]() |
|||
|
|||
![]()
Karl,
You could use a helper column to get the sans number values =SUBSTITUTE(A2,MID(A2,MATCH(FALSE,ISERROR(1*MID(A2 ,ROW(INDIRECT("1:10")),1)) ,0),10-SUM(1*ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1)))) *1,"") and then count uniques here. The formula is an array formula, so commit with Ctrl-Shift-Enter. -- HTH Bob Phillips "Karl Burrows" wrote in message ... It is in a database, but am using Excel to pull some formatted reports. Here is a better example, for multiple builders: Ryan Ryan Townhomes Ryan 60' Mulvaney - Greenbrier Mulvaney - 80 KB Home KB Home 70' I need to extract the unique values to give me: Ryan Mulvaney KB Home What I am doing is about impossible in Access, as it is pulling lot data into a formatted report spread over a 6 year rolling period. I would still have the same issue in Access as well if I were to query the data for these values. Thanks! wrote in message oups.com... what do you mean by reverse concatenation? ps - you should be storing DATA in a DATABASE and not in excel. -Aaron |
#8
![]() |
|||
|
|||
![]()
Assuming that Column A contains your data, enter the following formula
that needs to be confirmed with CONTROL+SHIFT+ENTER in B1 and copy down: =INDEX($D$1:$D$3,MATCH(TRUE,ISNUMBER(SEARCH($D$1:$ D$3,A1)),0)) ....where D1:D3 contains the values to extract, such as Ryan, Mulvaney, and KB Home. Hope this helps! In article , "Karl Burrows" wrote: It is in a database, but am using Excel to pull some formatted reports. Here is a better example, for multiple builders: Ryan Ryan Townhomes Ryan 60' Mulvaney - Greenbrier Mulvaney - 80 KB Home KB Home 70' I need to extract the unique values to give me: Ryan Mulvaney KB Home What I am doing is about impossible in Access, as it is pulling lot data into a formatted report spread over a 6 year rolling period. I would still have the same issue in Access as well if I were to query the data for these values. Thanks! wrote in message oups.com... what do you mean by reverse concatenation? ps - you should be storing DATA in a DATABASE and not in excel. -Aaron |
#9
![]() |
|||
|
|||
![]()
Where does this assume the OP's data is, and in what row(s) of the
helper column is it to be array entered? Alan Beban Bob Phillips wrote: Karl, You could use a helper column to get the sans number values =SUBSTITUTE(A2,MID(A2,MATCH(FALSE,ISERROR(1*MID(A2 ,ROW(INDIRECT("1:10")),1)) ,0),10-SUM(1*ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1)))) *1,"") and then count uniques here. The formula is an array formula, so commit with Ctrl-Shift-Enter. |
#10
![]() |
|||
|
|||
![]()
I have tried text to columns, but it tries to split out the formula instead
of the value in the cell (even if I create a new cell and convert it to text). Then, there are some builders that have several spaces in the name (KB Home 60). I tried to use LEFT to pull out to the next blank space, but that only pulled KB! "James Hamilton" wrote in message ... I note that there is a space between the unique information eg. "ryan" and the other data on the end eg "townhomes"....so try this: 1. highlight the column with the data in it 2. Go to DATA TEXT TO COLUMNS, then make sure "delimited" is selected and hit NEXT, then make sure that "SPACE" is checked, then hit NEXT, then FINISH. Can you let me know if this works? "Karl Burrows" wrote: It is in a database, but am using Excel to pull some formatted reports. Here is a better example, for multiple builders: Ryan Ryan Townhomes Ryan 60' Mulvaney - Greenbrier Mulvaney - 80 KB Home KB Home 70' I need to extract the unique values to give me: Ryan Mulvaney KB Home What I am doing is about impossible in Access, as it is pulling lot data into a formatted report spread over a 6 year rolling period. I would still have the same issue in Access as well if I were to query the data for these values. Thanks! wrote in message oups.com... what do you mean by reverse concatenation? ps - you should be storing DATA in a DATABASE and not in excel. -Aaron |
#11
![]() |
|||
|
|||
![]()
Well, perhaps I'm not being imaginative enough, but it is difficult for
me to conceive of a formula that will be able to treat Ryan Townhomes like a duplicate of Ryan and not treat KB Home like a duplicate of KB. Alan Beban Karl Burrows wrote: It is in a database, but am using Excel to pull some formatted reports. Here is a better example, for multiple builders: Ryan Ryan Townhomes Ryan 60' Mulvaney - Greenbrier Mulvaney - 80 KB Home KB Home 70' I need to extract the unique values to give me: Ryan Mulvaney KB Home What I am doing is about impossible in Access, as it is pulling lot data into a formatted report spread over a 6 year rolling period. I would still have the same issue in Access as well if I were to query the data for these values. Thanks! wrote in message oups.com... what do you mean by reverse concatenation? ps - you should be storing DATA in a DATABASE and not in excel. -Aaron |
#12
![]() |
|||
|
|||
![]()
That is really close! It works on about 50% of the builder names.
Examples: Works on: Beazer Homes 50' & 60' D.R. Horton 40 & 50 Ryan - Greenbrier Ryan - Aldridge Ryan Townhomes Westminster Townhomes KB Home 40 & 50 Does not work on: C.P. Morgan 40' & 50' Lennar Homes 40, 50, 55 & 60 McCar Homes & McCar Townhomes Mulvaney 60 Mulvaney Homes Mulvaney Townhomes Should convert to: C.P. Morgan Lennar Homes McCar Homes Mulvaney Strange how it works on some similar and not others. This is really good! Thanks! "Domenic" wrote in message ... Assuming that Column A contains your data, enter the following formula that needs to be confirmed with CONTROL+SHIFT+ENTER in B1 and copy down: =INDEX($D$1:$D$3,MATCH(TRUE,ISNUMBER(SEARCH($D$1:$ D$3,A1)),0)) ....where D1:D3 contains the values to extract, such as Ryan, Mulvaney, and KB Home. Hope this helps! In article , "Karl Burrows" wrote: It is in a database, but am using Excel to pull some formatted reports. Here is a better example, for multiple builders: Ryan Ryan Townhomes Ryan 60' Mulvaney - Greenbrier Mulvaney - 80 KB Home KB Home 70' I need to extract the unique values to give me: Ryan Mulvaney KB Home What I am doing is about impossible in Access, as it is pulling lot data into a formatted report spread over a 6 year rolling period. I would still have the same issue in Access as well if I were to query the data for these values. Thanks! wrote in message oups.com... what do you mean by reverse concatenation? ps - you should be storing DATA in a DATABASE and not in excel. -Aaron |
#13
![]() |
|||
|
|||
![]()
That works in some values, but not others. Pulls KB Home as KB, Lennar
Homes as Lennar, Saussy Burbank as Saussy, D.R. Horton as D.R. It does work on the ones that are only one name builders like Mulvaney, though. This is a tough one! "Bob Phillips" wrote in message ... Karl, You could use a helper column to get the sans number values =SUBSTITUTE(A2,MID(A2,MATCH(FALSE,ISERROR(1*MID(A2 ,ROW(INDIRECT("1:10")),1)) ,0),10-SUM(1*ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1)))) *1,"") and then count uniques here. The formula is an array formula, so commit with Ctrl-Shift-Enter. -- HTH Bob Phillips "Karl Burrows" wrote in message ... It is in a database, but am using Excel to pull some formatted reports. Here is a better example, for multiple builders: Ryan Ryan Townhomes Ryan 60' Mulvaney - Greenbrier Mulvaney - 80 KB Home KB Home 70' I need to extract the unique values to give me: Ryan Mulvaney KB Home What I am doing is about impossible in Access, as it is pulling lot data into a formatted report spread over a 6 year rolling period. I would still have the same issue in Access as well if I were to query the data for these values. Thanks! wrote in message oups.com... what do you mean by reverse concatenation? ps - you should be storing DATA in a DATABASE and not in excel. -Aaron |
#14
![]() |
|||
|
|||
![]()
It is tough when builder names vary from one word names to initials and name
and the multiple word names. "Alan Beban" wrote in message ... Well, perhaps I'm not being imaginative enough, but it is difficult for me to conceive of a formula that will be able to treat Ryan Townhomes like a duplicate of Ryan and not treat KB Home like a duplicate of KB. Alan Beban Karl Burrows wrote: It is in a database, but am using Excel to pull some formatted reports. Here is a better example, for multiple builders: Ryan Ryan Townhomes Ryan 60' Mulvaney - Greenbrier Mulvaney - 80 KB Home KB Home 70' I need to extract the unique values to give me: Ryan Mulvaney KB Home What I am doing is about impossible in Access, as it is pulling lot data into a formatted report spread over a 6 year rolling period. I would still have the same issue in Access as well if I were to query the data for these values. Thanks! wrote in message oups.com... what do you mean by reverse concatenation? ps - you should be storing DATA in a DATABASE and not in excel. -Aaron |
#15
![]() |
|||
|
|||
![]()
Karl Burrows wrote...
It is in a database, but am using Excel to pull some formatted reports. Here is a better example, for multiple builders: Ryan Ryan Townhomes Ryan 60' Mulvaney - Greenbrier Mulvaney - 80 KB Home KB Home 70' I need to extract the unique values to give me: Ryan Mulvaney KB Home .... It's not too difficult to get Ryan and KB Home using the following udf, which returns an array. Function foo(r As Range) As Variant Dim d As Object Dim c As Variant, x As Variant, t As String Set d = CreateObject("Scripting.Dictionary") For Each c In r t = c.Text If d.Exists(t) Then d.Item(t) = d.Item(t) + 1 _ Else d.Add Key:=t, Item:=1 Next c For Each c In d.Keys For Each x In d.Keys If x < c And x Like c & "*" Then d.Remove Key:=x Next x Next c foo = Application.WorksheetFunction.Transpose(d.Keys) End Function But reducing out Mulvaney is much more difficult in general because left substrings could be common to several distinct records, e.g., John Smith Builders John Smith & Sons Construction There may be approaches you could take using fuzzy string matching, but you may find it expedient to use the udf above to filter out most 'duplicates', then remove the remaining ones manually. |
#16
![]() |
|||
|
|||
![]()
If A2:A14 contains...
Beazer Homes 50' & 60' D.R. Horton 40 & 50 Ryan - Greenbrier Ryan - Aldridge Ryan Townhomes Westminster Townhomes KB Home 40 & 50 C.P. Morgan 40' & 50' Lennar Homes 40, 50, 55 & 60 McCar Homes & McCar Townhomes Mulvaney 60 Mulvaney Homes Mulvaney Townhomes ....and D2:D10 contains... Beazer C.P. Morgan D.R. Horton KB Home Lennar McCar Mulvaney Ryan Westminster ....enter the following formula in B2, and copy down... =INDEX($D$2:$D$10,MATCH(TRUE,ISNUMBER(SEARCH($D$2: $D$10,A2)),0)) ....confirmed with CONTROL+SHIFT+ENTER. You should get the following results... Beazer D.R. Horton Ryan Ryan Ryan Westminster KB Home C.P. Morgan Lennar McCar Mulvaney Mulvaney Mulvaney Then, if you want a unique list -- it only just dawned on me that this is probably what you want :) -- you can either use 'Advanced Filter' and check 'Unique records only' or use the following formula... C2, copied down until you get #N/A: =INDEX(B2:$B$14,MATCH(0,COUNTIF($C$1:C1,B2:$B$14), 0)) ....confirmed with CONTROL+SHIFT+ENTER. Hope this helps! In article , "Karl Burrows" wrote: That is really close! It works on about 50% of the builder names. Examples: Works on: Beazer Homes 50' & 60' D.R. Horton 40 & 50 Ryan - Greenbrier Ryan - Aldridge Ryan Townhomes Westminster Townhomes KB Home 40 & 50 Does not work on: C.P. Morgan 40' & 50' Lennar Homes 40, 50, 55 & 60 McCar Homes & McCar Townhomes Mulvaney 60 Mulvaney Homes Mulvaney Townhomes Should convert to: C.P. Morgan Lennar Homes McCar Homes Mulvaney Strange how it works on some similar and not others. This is really good! Thanks! |
#17
![]() |
|||
|
|||
![]()
Problem is I don't have a column of named builders like you have in D2:D10.
Maybe this would be easier. Here is the exact list of builders they have. This can also change as new subdivisions are added, but the list is populated from an Access database. Builders may have more than one type of lot in one subdivision or may have lots in multiple subdivisions, which is what creates the need to identify builders this way for reporting. I don't think this is going to be possible since Excel would have no way of knowing that "Greenbrier" was not part of the builder name. Beazer Homes Beazer Homes 50' Beazer Homes 60' C.P. Morgan 40' C.P. Morgan 50' D.R. Horton D.R. Horton D.R. Horton 40 D.R. Horton 50 KB Home KB Home 40 KB Home 50 Lennar Homes 40 Lennar Homes 50 Lennar Homes 55 Lennar Homes 60 McCar Homes McCar Townhomes Mulvaney 60 Mulvaney Homes Mulvaney Townhomes Pulte Pulte Ryan Ryan - Aldridge Ryan - Greenbrier Ryan 60 Ryan 66 Ryan Townhomes Ryan Townhomes Saussy Burbank Unsold Townhome Lots Westminster Westminster Townhomes Thanks! "Domenic" wrote in message ... If A2:A14 contains... Beazer Homes 50' & 60' D.R. Horton 40 & 50 Ryan - Greenbrier Ryan - Aldridge Ryan Townhomes Westminster Townhomes KB Home 40 & 50 C.P. Morgan 40' & 50' Lennar Homes 40, 50, 55 & 60 McCar Homes & McCar Townhomes Mulvaney 60 Mulvaney Homes Mulvaney Townhomes ....and D2:D10 contains... Beazer C.P. Morgan D.R. Horton KB Home Lennar McCar Mulvaney Ryan Westminster ....enter the following formula in B2, and copy down... =INDEX($D$2:$D$10,MATCH(TRUE,ISNUMBER(SEARCH($D$2: $D$10,A2)),0)) ....confirmed with CONTROL+SHIFT+ENTER. You should get the following results... Beazer D.R. Horton Ryan Ryan Ryan Westminster KB Home C.P. Morgan Lennar McCar Mulvaney Mulvaney Mulvaney Then, if you want a unique list -- it only just dawned on me that this is probably what you want :) -- you can either use 'Advanced Filter' and check 'Unique records only' or use the following formula... C2, copied down until you get #N/A: =INDEX(B2:$B$14,MATCH(0,COUNTIF($C$1:C1,B2:$B$14), 0)) ....confirmed with CONTROL+SHIFT+ENTER. Hope this helps! In article , "Karl Burrows" wrote: That is really close! It works on about 50% of the builder names. Examples: Works on: Beazer Homes 50' & 60' D.R. Horton 40 & 50 Ryan - Greenbrier Ryan - Aldridge Ryan Townhomes Westminster Townhomes KB Home 40 & 50 Does not work on: C.P. Morgan 40' & 50' Lennar Homes 40, 50, 55 & 60 McCar Homes & McCar Townhomes Mulvaney 60 Mulvaney Homes Mulvaney Townhomes Should convert to: C.P. Morgan Lennar Homes McCar Homes Mulvaney Strange how it works on some similar and not others. This is really good! Thanks! |
#18
![]() |
|||
|
|||
![]()
In article ,
"Karl Burrows" wrote: Problem is I don't have a column of named builders like you have in D2:D10. I should have said, 'enter a list of builders in D2:D10', as described. So with the list of builders you've now provided, enter the following list in D2:D13... Beazer C.P. Morgan D.R. Horton KB Lennar McCar Mulvaney Pulte Ryan Saussy Unsold Westminster ....and change the references in the formula accordingly. Does this help? |
#19
![]() |
|||
|
|||
![]()
I don't know why you picked "Greenbrier" as a problem. Harlan Grove
provided code that would eliminate it when operating on the list below. The real problem seems to be that your list of "builders names" does not include all the names of the builders. In particular, it does not include C.P. Morgan Lennar Homes McCar Mulvaney each of which appears from your descriptions to be a builder's name. (I am assuming that "Unsold Townhome Lots" is equivalent to a builder's name.) Alan Beban Karl Burrows wrote: Problem is I don't have a column of named builders like you have in D2:D10. Maybe this would be easier. Here is the exact list of builders they have. This can also change as new subdivisions are added, but the list is populated from an Access database. Builders may have more than one type of lot in one subdivision or may have lots in multiple subdivisions, which is what creates the need to identify builders this way for reporting. I don't think this is going to be possible since Excel would have no way of knowing that "Greenbrier" was not part of the builder name. Beazer Homes Beazer Homes 50' Beazer Homes 60' C.P. Morgan 40' C.P. Morgan 50' D.R. Horton D.R. Horton D.R. Horton 40 D.R. Horton 50 KB Home KB Home 40 KB Home 50 Lennar Homes 40 Lennar Homes 50 Lennar Homes 55 Lennar Homes 60 McCar Homes McCar Townhomes Mulvaney 60 Mulvaney Homes Mulvaney Townhomes Pulte Pulte Ryan Ryan - Aldridge Ryan - Greenbrier Ryan 60 Ryan 66 Ryan Townhomes Ryan Townhomes Saussy Burbank Unsold Townhome Lots Westminster Westminster Townhomes Thanks! |
#20
![]() |
|||
|
|||
![]()
I think we have gotten away from the actual problem. It doesn't matter who
the builder is (I only originally posted a partial list of builders), I want to programmatically or through a formula remove all the uncommon information from any builder that may be there at some point in time. Builders may have a suffix to further define the lots they own in a neighborhood, so it doesn't matter who is in my list now. I also don't want to create a list of the builders to match to the list to create the unique values. I am doing that already and want to eliminate that portion of the worksheet input. It boils down to taking a column of builder names and stripping off the data that is not common to all values for that builder, whether that means removing 60', - 60, - Greenbrier, or any other designation they may come up with. I think that is where the hang up is. Unless you can compare all the builder values and then say well these are pretty much alike other than the "60" at the end or "townhome" or other, so let's remove that portion. I haven't had a chance to try to coding and maybe that is what it does. Does all this make sense? Thanks for everyone's patience. "Alan Beban" wrote in message ... I don't know why you picked "Greenbrier" as a problem. Harlan Grove provided code that would eliminate it when operating on the list below. The real problem seems to be that your list of "builders names" does not include all the names of the builders. In particular, it does not include C.P. Morgan Lennar Homes McCar Mulvaney each of which appears from your descriptions to be a builder's name. (I am assuming that "Unsold Townhome Lots" is equivalent to a builder's name.) Alan Beban Karl Burrows wrote: Problem is I don't have a column of named builders like you have in D2:D10. Maybe this would be easier. Here is the exact list of builders they have. This can also change as new subdivisions are added, but the list is populated from an Access database. Builders may have more than one type of lot in one subdivision or may have lots in multiple subdivisions, which is what creates the need to identify builders this way for reporting. I don't think this is going to be possible since Excel would have no way of knowing that "Greenbrier" was not part of the builder name. Beazer Homes Beazer Homes 50' Beazer Homes 60' C.P. Morgan 40' C.P. Morgan 50' D.R. Horton D.R. Horton D.R. Horton 40 D.R. Horton 50 KB Home KB Home 40 KB Home 50 Lennar Homes 40 Lennar Homes 50 Lennar Homes 55 Lennar Homes 60 McCar Homes McCar Townhomes Mulvaney 60 Mulvaney Homes Mulvaney Townhomes Pulte Pulte Ryan Ryan - Aldridge Ryan - Greenbrier Ryan 60 Ryan 66 Ryan Townhomes Ryan Townhomes Saussy Burbank Unsold Townhome Lots Westminster Westminster Townhomes Thanks! |
#21
![]() |
|||
|
|||
![]()
Karl Burrows wrote...
.... It boils down to taking a column of builder names and stripping off the data that is not common to all values for that builder, whether that means removing 60', - 60, - Greenbrier, or any other designation they may come up with. I think that is where the hang up is. Unless you can compare all the builder values and then say well these are pretty much alike other than the "60" at the end or "townhome" or other, so let's remove that portion. I haven't had a chance to try to coding and maybe that is what it does. .... The problem is that some of the qualifiers added to some of the builder names could be legitimate parts of a person's or company's name. I'm not saying that's in fact the case, just that it could be. For example, Home and House can be surnames. If the only added qualifiers you have to deal with involve anything beginning with a decimal numeral or a hyphen, you could use regular expressions to remove them. But you also have normal words appended with no more than a space separating them from the builder name. Unless *YOU* could compile an exhaustive list of such words that would always be deleted and never erroneously truncate any builder's name, then you could use a list of these words as tokens to remove from your records. Then feed what's left through a dictionary object to eliminate duplicates. |
#22
![]() |
|||
|
|||
![]()
Thus, the problem. Because each builder can have a unique name with some of
those qualifiers in there, it is about impossible to identify their true name. I think maybe developing a naming convention to add the hyphen or something else that we can tell it to strip everything right of that character is going to be the only way to go. As it is now, it is truly "fuzzy logic!" Thanks for all the help! "Harlan Grove" wrote in message oups.com... Karl Burrows wrote... .... It boils down to taking a column of builder names and stripping off the data that is not common to all values for that builder, whether that means removing 60', - 60, - Greenbrier, or any other designation they may come up with. I think that is where the hang up is. Unless you can compare all the builder values and then say well these are pretty much alike other than the "60" at the end or "townhome" or other, so let's remove that portion. I haven't had a chance to try to coding and maybe that is what it does. .... The problem is that some of the qualifiers added to some of the builder names could be legitimate parts of a person's or company's name. I'm not saying that's in fact the case, just that it could be. For example, Home and House can be surnames. If the only added qualifiers you have to deal with involve anything beginning with a decimal numeral or a hyphen, you could use regular expressions to remove them. But you also have normal words appended with no more than a space separating them from the builder name. Unless *YOU* could compile an exhaustive list of such words that would always be deleted and never erroneously truncate any builder's name, then you could use a list of these words as tokens to remove from your records. Then feed what's left through a dictionary object to eliminate duplicates. |
#23
![]() |
|||
|
|||
![]()
Harlan Grove wrote:
Karl Burrows wrote... ... It boils down to taking a column of builder names and stripping off the data that is not common to all values for that builder, whether that means removing 60', - 60, - Greenbrier, or any other designation they may come up with. I think that is where the hang up is. Unless you can compare all the builder values and then say well these are pretty much alike other than the "60" at the end or "townhome" or other, so let's remove that portion. I haven't had a chance to try to coding and maybe that is what it does. ... The problem is that some of the qualifiers added to some of the builder names could be legitimate parts of a person's or company's name. I'm not saying that's in fact the case, just that it could be. . . . Unless I misunderstand, we already know from the OP's 3rd posting in this thread that it is the case. He made it clear that in Ryan Townhomes, the builder's name is Ryan and Townhomes is a suffix; and that in KB Home the builder's name is KB Home and Home is not a suffix. Alan Beban |
#24
![]() |
|||
|
|||
![]()
"Alan Beban" wrote...
.... Unless I misunderstand, we already know from the OP's 3rd posting in this thread that it is the case. He made it clear that in Ryan Townhomes, the builder's name is Ryan and Townhomes is a suffix; and that in KB Home the builder's name is KB Home and Home is not a suffix. Then it's the same situation as parsing surnames from a list of peoples from many original nationalities but with inappropriate English capitalization rules applied. E.g., Charles Der Ruud Van Der Aalter Nguyen Van Tieu You come up with a rule to handle all these correctly, and you may have a prayer handling general company names which follow even fewer rules. [This is rhetorical. It's theoretically possible if there's a sufficiently complete rules base, but it'd be expedient to use an approach that works 80-90% of the time and correct the rest manually.] |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Populate a column by extracting unique values from another column? | Excel Worksheet Functions | |||
Count Unique Values | Excel Worksheet Functions | |||
Extract AutoFilter Column Values? | Excel Discussion (Misc queries) | |||
Count number of Unique values | Excel Worksheet Functions | |||
Formula to list unique values | Excel Worksheet Functions |