![]() |
removing dupliates from 2 lists with similar content
I have many large lists. Each contains unique entries, as well as duplicate
entires. I need to combine all lists into one long list with no duplicates. Using one field as a primary key value, I want to remove all duplicate entries. Where the data is different but the key entry is the same, I only need to keep the first occurrence. e.g. List 1 key field | name | address | city abcd | A bunch of data, some empty fields..... efgh zzww qnbc List 2 key field | name | address | City| Phone Number kdkd | A bunch of data, some empty fields..... abcd werf asdf asdd zzww poio ---------------------- Note that entries abcd and zzww occur in both lists. Aligning the columns and putting all lists into one big list is simple. Please help me with a method to identify the duplicates and keep only the first occurence of them, based on the key field value. Note that these lists could have many thousands of entries and some entries occur many times, so the simplest method would be most appreciated for the sake of speed. For now, I am pulling the data into a full power text editor and that includes sort and remove duplicate functions. I think I can figure out how to do this in Excel with multiple sorts and 'if this cell is the same as one above it' things, but I am finding it is taking many many steps to accomplish this, specially if there are more than 2 duplicates. Is there a simple function like the POSIX uniq command to do this, preferably in one step? thanks |
removing dupliates from 2 lists with similar content
Build a giant list made up of all the other lists.
Select the key column Data|Filter|advanced filter|Check unique records only Select your range and paste into a new sheet. The visible cells should be the only ones pasted onto that new sheet. excel-lookuper wrote: I have many large lists. Each contains unique entries, as well as duplicate entires. I need to combine all lists into one long list with no duplicates. Using one field as a primary key value, I want to remove all duplicate entries. Where the data is different but the key entry is the same, I only need to keep the first occurrence. e.g. List 1 key field | name | address | city abcd | A bunch of data, some empty fields..... efgh zzww qnbc List 2 key field | name | address | City| Phone Number kdkd | A bunch of data, some empty fields..... abcd werf asdf asdd zzww poio ---------------------- Note that entries abcd and zzww occur in both lists. Aligning the columns and putting all lists into one big list is simple. Please help me with a method to identify the duplicates and keep only the first occurence of them, based on the key field value. Note that these lists could have many thousands of entries and some entries occur many times, so the simplest method would be most appreciated for the sake of speed. For now, I am pulling the data into a full power text editor and that includes sort and remove duplicate functions. I think I can figure out how to do this in Excel with multiple sorts and 'if this cell is the same as one above it' things, but I am finding it is taking many many steps to accomplish this, specially if there are more than 2 duplicates. Is there a simple function like the POSIX uniq command to do this, preferably in one step? thanks -- Dave Peterson |
removing dupliates from 2 lists with similar content
Thanks Dave.
You have answered my question perfectly and I really appreciate it. I always suspected there had to be a way to do this built into Excel, but I never found it. De-duping lists like these has always been a task of heroic proportions for me from within Excel, which is why I always ended up breaking the list out and using external tools/ Shell scripts. Your advice will be a real boon when I need to stay inside Office. Sorry to ask so many questions. Is there a way to do this via formulae? Is my best course of action to do up a simple macro if I need to create a sheet that others can just drop their lists into? Thanks again. Marc "Dave Peterson" wrote: Build a giant list made up of all the other lists. Select the key column Data|Filter|advanced filter|Check unique records only Select your range and paste into a new sheet. The visible cells should be the only ones pasted onto that new sheet. excel-lookuper wrote: I have many large lists. Each contains unique entries, as well as duplicate entires. I need to combine all lists into one long list with no duplicates. Using one field as a primary key value, I want to remove all duplicate entries. Where the data is different but the key entry is the same, I only need to keep the first occurrence. e.g. List 1 key field | name | address | city abcd | A bunch of data, some empty fields..... efgh zzww qnbc List 2 key field | name | address | City| Phone Number kdkd | A bunch of data, some empty fields..... abcd werf asdf asdd zzww poio ---------------------- Note that entries abcd and zzww occur in both lists. Aligning the columns and putting all lists into one big list is simple. Please help me with a method to identify the duplicates and keep only the first occurence of them, based on the key field value. Note that these lists could have many thousands of entries and some entries occur many times, so the simplest method would be most appreciated for the sake of speed. For now, I am pulling the data into a full power text editor and that includes sort and remove duplicate functions. I think I can figure out how to do this in Excel with multiple sorts and 'if this cell is the same as one above it' things, but I am finding it is taking many many steps to accomplish this, specially if there are more than 2 duplicates. Is there a simple function like the POSIX uniq command to do this, preferably in one step? thanks -- Dave Peterson |
removing dupliates from 2 lists with similar content
I'm not sure how you'd approach multiple lists with only formulas. But I bet
you could fiddle your way through it. I can't believe it would be quicker than data|filter|advanced filter, though. You may want to look at Chip Pearson's site: http://www.cpearson.com/excel/duplicat.htm He has lots of techniques there. Marc the confused wrote: Thanks Dave. You have answered my question perfectly and I really appreciate it. I always suspected there had to be a way to do this built into Excel, but I never found it. De-duping lists like these has always been a task of heroic proportions for me from within Excel, which is why I always ended up breaking the list out and using external tools/ Shell scripts. Your advice will be a real boon when I need to stay inside Office. Sorry to ask so many questions. Is there a way to do this via formulae? Is my best course of action to do up a simple macro if I need to create a sheet that others can just drop their lists into? Thanks again. Marc "Dave Peterson" wrote: Build a giant list made up of all the other lists. Select the key column Data|Filter|advanced filter|Check unique records only Select your range and paste into a new sheet. The visible cells should be the only ones pasted onto that new sheet. excel-lookuper wrote: I have many large lists. Each contains unique entries, as well as duplicate entires. I need to combine all lists into one long list with no duplicates. Using one field as a primary key value, I want to remove all duplicate entries. Where the data is different but the key entry is the same, I only need to keep the first occurrence. e.g. List 1 key field | name | address | city abcd | A bunch of data, some empty fields..... efgh zzww qnbc List 2 key field | name | address | City| Phone Number kdkd | A bunch of data, some empty fields..... abcd werf asdf asdd zzww poio ---------------------- Note that entries abcd and zzww occur in both lists. Aligning the columns and putting all lists into one big list is simple. Please help me with a method to identify the duplicates and keep only the first occurence of them, based on the key field value. Note that these lists could have many thousands of entries and some entries occur many times, so the simplest method would be most appreciated for the sake of speed. For now, I am pulling the data into a full power text editor and that includes sort and remove duplicate functions. I think I can figure out how to do this in Excel with multiple sorts and 'if this cell is the same as one above it' things, but I am finding it is taking many many steps to accomplish this, specially if there are more than 2 duplicates. Is there a simple function like the POSIX uniq command to do this, preferably in one step? thanks -- Dave Peterson -- Dave Peterson |
removing dupliates from 2 lists with similar content
Hi Marc
In XL2007 there is a feature to Remove Duplicates under the Data tab. -- Regards Roger Govier "Marc the confused" wrote in message ... Thanks Dave. You have answered my question perfectly and I really appreciate it. I always suspected there had to be a way to do this built into Excel, but I never found it. De-duping lists like these has always been a task of heroic proportions for me from within Excel, which is why I always ended up breaking the list out and using external tools/ Shell scripts. Your advice will be a real boon when I need to stay inside Office. Sorry to ask so many questions. Is there a way to do this via formulae? Is my best course of action to do up a simple macro if I need to create a sheet that others can just drop their lists into? Thanks again. Marc "Dave Peterson" wrote: Build a giant list made up of all the other lists. Select the key column Data|Filter|advanced filter|Check unique records only Select your range and paste into a new sheet. The visible cells should be the only ones pasted onto that new sheet. excel-lookuper wrote: I have many large lists. Each contains unique entries, as well as duplicate entires. I need to combine all lists into one long list with no duplicates. Using one field as a primary key value, I want to remove all duplicate entries. Where the data is different but the key entry is the same, I only need to keep the first occurrence. e.g. List 1 key field | name | address | city abcd | A bunch of data, some empty fields..... efgh zzww qnbc List 2 key field | name | address | City| Phone Number kdkd | A bunch of data, some empty fields..... abcd werf asdf asdd zzww poio ---------------------- Note that entries abcd and zzww occur in both lists. Aligning the columns and putting all lists into one big list is simple. Please help me with a method to identify the duplicates and keep only the first occurence of them, based on the key field value. Note that these lists could have many thousands of entries and some entries occur many times, so the simplest method would be most appreciated for the sake of speed. For now, I am pulling the data into a full power text editor and that includes sort and remove duplicate functions. I think I can figure out how to do this in Excel with multiple sorts and 'if this cell is the same as one above it' things, but I am finding it is taking many many steps to accomplish this, specially if there are more than 2 duplicates. Is there a simple function like the POSIX uniq command to do this, preferably in one step? thanks -- Dave Peterson |
removing dupliates from 2 lists with similar content
Thanks again. I just adapted your example to a live document and it works
perfectly. Using formulae lets me automate processes without having to rely on clicking the menus. I find that my it is easier to give a spreadsheet template to coworkers than to have to train them to click on menus and follow procedures. I can say "Paste here", "drag down the formula in this column" and "see results there". Thanks too for that web site. I have bookmarked it... it is a great resource. The formulas are exactly what I needed. "Dave Peterson" wrote: I'm not sure how you'd approach multiple lists with only formulas. But I bet you could fiddle your way through it. I can't believe it would be quicker than data|filter|advanced filter, though. You may want to look at Chip Pearson's site: http://www.cpearson.com/excel/duplicat.htm He has lots of techniques there. Marc the confused wrote: Thanks Dave. You have answered my question perfectly and I really appreciate it. I always suspected there had to be a way to do this built into Excel, but I never found it. De-duping lists like these has always been a task of heroic proportions for me from within Excel, which is why I always ended up breaking the list out and using external tools/ Shell scripts. Your advice will be a real boon when I need to stay inside Office. Sorry to ask so many questions. Is there a way to do this via formulae? Is my best course of action to do up a simple macro if I need to create a sheet that others can just drop their lists into? Thanks again. Marc "Dave Peterson" wrote: Build a giant list made up of all the other lists. Select the key column Data|Filter|advanced filter|Check unique records only Select your range and paste into a new sheet. The visible cells should be the only ones pasted onto that new sheet. excel-lookuper wrote: I have many large lists. Each contains unique entries, as well as duplicate entires. I need to combine all lists into one long list with no duplicates. Using one field as a primary key value, I want to remove all duplicate entries. Where the data is different but the key entry is the same, I only need to keep the first occurrence. e.g. List 1 key field | name | address | city abcd | A bunch of data, some empty fields..... efgh zzww qnbc List 2 key field | name | address | City| Phone Number kdkd | A bunch of data, some empty fields..... abcd werf asdf asdd zzww poio ---------------------- Note that entries abcd and zzww occur in both lists. Aligning the columns and putting all lists into one big list is simple. Please help me with a method to identify the duplicates and keep only the first occurence of them, based on the key field value. Note that these lists could have many thousands of entries and some entries occur many times, so the simplest method would be most appreciated for the sake of speed. For now, I am pulling the data into a full power text editor and that includes sort and remove duplicate functions. I think I can figure out how to do this in Excel with multiple sorts and 'if this cell is the same as one above it' things, but I am finding it is taking many many steps to accomplish this, specially if there are more than 2 duplicates. Is there a simple function like the POSIX uniq command to do this, preferably in one step? thanks -- Dave Peterson -- Dave Peterson |
removing dupliates from 2 lists with similar content
Thanks for the tip Roger.
Microsoft is really demonstrating its commitment to innovation with Excel 2007 by introducing the "uniq" function that has been part of Unix since the 1970's ;-) Seriously though, Excel is a great product, and I actually prefer it to OpenOffice, Quatro or any of the others I have seen. Unfortunately, I own 97 and use 2000 at work. I simply don't have the cash to upgrade to 2007, so I am really grateful for the advice above. Marc "Roger Govier" wrote: Hi Marc In XL2007 there is a feature to Remove Duplicates under the Data tab. -- Regards Roger Govier "Marc the confused" wrote in message ... Thanks Dave. You have answered my question perfectly and I really appreciate it. I always suspected there had to be a way to do this built into Excel, but I never found it. De-duping lists like these has always been a task of heroic proportions for me from within Excel, which is why I always ended up breaking the list out and using external tools/ Shell scripts. Your advice will be a real boon when I need to stay inside Office. Sorry to ask so many questions. Is there a way to do this via formulae? Is my best course of action to do up a simple macro if I need to create a sheet that others can just drop their lists into? Thanks again. Marc "Dave Peterson" wrote: Build a giant list made up of all the other lists. Select the key column Data|Filter|advanced filter|Check unique records only Select your range and paste into a new sheet. The visible cells should be the only ones pasted onto that new sheet. excel-lookuper wrote: I have many large lists. Each contains unique entries, as well as duplicate entires. I need to combine all lists into one long list with no duplicates. Using one field as a primary key value, I want to remove all duplicate entries. Where the data is different but the key entry is the same, I only need to keep the first occurrence. e.g. List 1 key field | name | address | city abcd | A bunch of data, some empty fields..... efgh zzww qnbc List 2 key field | name | address | City| Phone Number kdkd | A bunch of data, some empty fields..... abcd werf asdf asdd zzww poio ---------------------- Note that entries abcd and zzww occur in both lists. Aligning the columns and putting all lists into one big list is simple. Please help me with a method to identify the duplicates and keep only the first occurence of them, based on the key field value. Note that these lists could have many thousands of entries and some entries occur many times, so the simplest method would be most appreciated for the sake of speed. For now, I am pulling the data into a full power text editor and that includes sort and remove duplicate functions. I think I can figure out how to do this in Excel with multiple sorts and 'if this cell is the same as one above it' things, but I am finding it is taking many many steps to accomplish this, specially if there are more than 2 duplicates. Is there a simple function like the POSIX uniq command to do this, preferably in one step? thanks -- Dave Peterson |
All times are GMT +1. The time now is 11:52 PM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com