Home |
Search |
Today's Posts |
|
#1
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Jan 11, 1:12*pm, Henrietta Horne wrote:
Since you guys were so helpful, I hate to wear out my welcome, but... Now that I have the high-index data from my previous question, I need to process the list of words to eliminate duplicates. The table below shows some sample data. The first column is the worksheet row numbers. The Word column (B) contains the words. The Score column (C) contains a count of how often that words occurs is a large collection of text (400 million words). The Index column (D) contains the result of the GetHighIndex UDF from my previous post. I need another UDF to walk down the list looking for duplicate words (tee, a, in). When it finds a duplicate, it will merge the two rows into one by adding the Scores and deleting the second row. In the example below, rows 5 & 6 would be combined to form one row with a Score of 3301 (199 + 3102). Rows 62-65 would be combined into one row with a Score of 7,140,219 (808 + 9,711 + 279,364 + 6,850,336). Only exact matches are combined. The two "balloon" rows would be combined as would the two "ballooning" rows, but would not include the "balloonist" row. Can I impose on someone to get this started? The parts I am not sure how to do inside a UDF a 1. How to step through the rows and address the cells (relatively). 2. How to delete a row. * * * * * *B * * * * C * * *D * * 4 * *Word * * *Score *Index * * 5 *tee * * * * * 199 * * 2 * * 6 *tee * * * * 3,102 * * 2 * * 7 *a * * * * * * 298 * * 3 * * 8 *a * * * 9,996,626 * * 3 * * 9 *at * * *1,730,609 * * 3 * *10 *eat * * * *69,484 * * 3 * *62 *in * * * * * *808 * * 6 * *63 *in * * * * *9,711 * * 6 * *64 *in * * * *279,364 * * 6 * *65 *in * * *6,850,336 * * 6 *2054 *dear * * * * * 97 * *11 *2055 *dear * * * *2,015 * *11 *2056 *dear * * * *3,364 * *11 *2057 *dear * * * *8,417 * *11 32159 *balloon * * * 536 * *20 32160 *balloon * * 4,887 * *20 32161 *ballooning * * 28 * *20 32162 *ballooning * * 82 * *20 32163 *balloonist * * 51 * *20 48196 *turquoise * * 435 * *26 48197 *turquoise * * 718 * *26 49270 *ad-lib * * * * 29 * *27 49271 *ad-lib * * * * 46 * *27 49272 *ad-libbed * * *40 * *27 49273 *ad-libbing * * 19 * *27 Trying to copy your data for testing didn't work so what you need is a looping macro from the bottom up to do this. "If desired, send your file to dguillett @gmail.com I will only look if: 1. You send a copy of this message on an inserted sheet 2. You give me the newsgroup and the subject line 3. You send a clear explanation of what you want 4. You send before/after examples and expected results." |
#2
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Tue, 11 Jan 2011 15:12:06 -0800 (PST), Don Guillett Excel MVP
wrote: On Jan 11, 1:12*pm, Henrietta Horne wrote: Since you guys were so helpful, I hate to wear out my welcome, but... Now that I have the high-index data from my previous question, I need to process the list of words to eliminate duplicates. The table below shows some sample data. The first column is the worksheet row numbers. The Word column (B) contains the words. The Score column (C) contains a count of how often that words occurs is a large collection of text (400 million words). The Index column (D) contains the result of the GetHighIndex UDF from my previous post. I need another UDF to walk down the list looking for duplicate words (tee, a, in). When it finds a duplicate, it will merge the two rows into one by adding the Scores and deleting the second row. In the example below, rows 5 & 6 would be combined to form one row with a Score of 3301 (199 + 3102). Rows 62-65 would be combined into one row with a Score of 7,140,219 (808 + 9,711 + 279,364 + 6,850,336). Only exact matches are combined. The two "balloon" rows would be combined as would the two "ballooning" rows, but would not include the "balloonist" row. Can I impose on someone to get this started? The parts I am not sure how to do inside a UDF a 1. How to step through the rows and address the cells (relatively). 2. How to delete a row. * * * * * *B * * * * C * * *D * * 4 * *Word * * *Score *Index * * 5 *tee * * * * * 199 * * 2 * * 6 *tee * * * * 3,102 * * 2 * * 7 *a * * * * * * 298 * * 3 * * 8 *a * * * 9,996,626 * * 3 * * 9 *at * * *1,730,609 * * 3 * *10 *eat * * * *69,484 * * 3 * *62 *in * * * * * *808 * * 6 * *63 *in * * * * *9,711 * * 6 * *64 *in * * * *279,364 * * 6 * *65 *in * * *6,850,336 * * 6 *2054 *dear * * * * * 97 * *11 *2055 *dear * * * *2,015 * *11 *2056 *dear * * * *3,364 * *11 *2057 *dear * * * *8,417 * *11 32159 *balloon * * * 536 * *20 32160 *balloon * * 4,887 * *20 32161 *ballooning * * 28 * *20 32162 *ballooning * * 82 * *20 32163 *balloonist * * 51 * *20 48196 *turquoise * * 435 * *26 48197 *turquoise * * 718 * *26 49270 *ad-lib * * * * 29 * *27 49271 *ad-lib * * * * 46 * *27 49272 *ad-libbed * * *40 * *27 49273 *ad-libbing * * 19 * *27 Trying to copy your data for testing didn't work so what you need is a looping macro from the bottom up to do this. "If desired, send your file to dguillett @gmail.com I will only look if: 1. You send a copy of this message on an inserted sheet 2. You give me the newsgroup and the subject line 3. You send a clear explanation of what you want 4. You send before/after examples and expected results." I'll try it myself first. I (when) I fail, I may send it over. Thanks. |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Merge 2 excel worksheets into one and remove duplicate entries. | Excel Worksheet Functions | |||
Mail Merge - file has duplicate data | Excel Discussion (Misc queries) | |||
Select and merge cells with duplicate values | Excel Programming | |||
How to create duplicate labels with mail merge? | Excel Discussion (Misc queries) | |||
deleting duplicate records in a mail merge | Excel Discussion (Misc queries) |