Home |
Search |
Today's Posts |
|
#1
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Since you guys were so helpful, I hate to wear out my welcome, but...
Now that I have the high-index data from my previous question, I need to process the list of words to eliminate duplicates. The table below shows some sample data. The first column is the worksheet row numbers. The Word column (B) contains the words. The Score column (C) contains a count of how often that words occurs is a large collection of text (400 million words). The Index column (D) contains the result of the GetHighIndex UDF from my previous post. I need another UDF to walk down the list looking for duplicate words (tee, a, in). When it finds a duplicate, it will merge the two rows into one by adding the Scores and deleting the second row. In the example below, rows 5 & 6 would be combined to form one row with a Score of 3301 (199 + 3102). Rows 62-65 would be combined into one row with a Score of 7,140,219 (808 + 9,711 + 279,364 + 6,850,336). Only exact matches are combined. The two "balloon" rows would be combined as would the two "ballooning" rows, but would not include the "balloonist" row. Can I impose on someone to get this started? The parts I am not sure how to do inside a UDF a 1. How to step through the rows and address the cells (relatively). 2. How to delete a row. B C D 4 Word Score Index 5 tee 199 2 6 tee 3,102 2 7 a 298 3 8 a 9,996,626 3 9 at 1,730,609 3 10 eat 69,484 3 62 in 808 6 63 in 9,711 6 64 in 279,364 6 65 in 6,850,336 6 2054 dear 97 11 2055 dear 2,015 11 2056 dear 3,364 11 2057 dear 8,417 11 32159 balloon 536 20 32160 balloon 4,887 20 32161 ballooning 28 20 32162 ballooning 82 20 32163 balloonist 51 20 48196 turquoise 435 26 48197 turquoise 718 26 49270 ad-lib 29 27 49271 ad-lib 46 27 49272 ad-libbed 40 27 49273 ad-libbing 19 27 |
#2
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
"Henrietta Horne" wrote in message
... Since you guys were so helpful, I hate to wear out my welcome, but... <snip Can I impose on someone to get this started? The parts I am not sure how to do inside a UDF a 1. How to step through the rows and address the cells (relatively). The working procedure I'm posting below might give you some ideas to work with, even though I'm making no attempt to apply it to your question <grin [since you show a desire to learn! :) ] 2. How to delete a row. one possibility: Range("A5").Select Selection.EntireRow.Delete (code the result of recording a macro while manually selecting a cell and using the UI to delete the row) ========== begin code Sub Copy2BreakTable() ' 1/07/11 cm remove auto-filter; use range for find ' 1/25/10 cm add auto-filter on Plant ID ... partially tested ' 08 29 09 cm disable screen updates during startup Const newRow = 2 ' New Data row Const lookRows = 300 ' Only search newest rows Dim ErrNum As Variant Dim newDate, thisDate, currentDate Dim newPlant, newLotID Dim x2, x$, x1$ Dim SearchBegin, SearchEnd, SearchDirection Dim firstRow, lastRow, currentRow ' cylinder data range 'Dim bPlantFilter As Boolean '' turn off slow screen updating during searching ' Application.ScreenUpdating = False Dim screenUpdateState As Boolean Dim statusBarState As Boolean Dim calcState As XlCalculation Dim eventsState As Boolean Dim displayPageBreakState As Boolean With Application screenUpdateState = .ScreenUpdating statusBarState = .DisplayStatusBar calcState = .Calculation eventsState = .EnableEvents End With 'turn off some Excel functionality so your code runs faster With Application .ScreenUpdating = False .DisplayStatusBar = False .Calculation = xlCalculationManual .EnableEvents = False End With Set c = Nothing ' insure exit test is valid If newSub() Then ' newSub returns False if activesheet not valid cylData ' copies new cylinder measurements to row newRow; ' returns with BreakTable!Data as ActiveSheet ' Look in last lookRows rows for match in LotID ' Begin search at current row; look up or down depending ' on date comparison; begin at last row if current row ' is outside of lookRows range ' gather data of interest... newPlant = Cells(newRow, BreakTableColumns.Plant).Value newLotID = Cells(newRow, BreakTableColumns.LotID).Value newDate = Cells(newRow, BreakTableColumns.SampleDate).Value Set r = Range("cylData") ' all cylinder data firstRow = r.Row lastRow = r.Rows.Count + firstRow - 1 'xx' marker row SearchEnd = lastRow SearchBegin = lastRow - lookRows If SearchBegin < firstRow Then SearchBegin = firstRow End If currentRow = Selection.Row If currentRow < SearchBegin Or _ currentRow = SearchEnd Then ' search from the bottom up - ' new break data tends to be recent currentRow = SearchEnd End If Select Case newPlant Case "MCSS" Case "LCP" ' special processing for LCP worksheet lotID If Mid(newLotID, 7, 1) = "+" Then newLotID = Left(newLotID, 7) ElseIf Right(newLotID, 2) = "++" Then newLotID = Left(newLotID, 6) & "++" Else newLotID = Left(newLotID, 6) End If Cells(newRow, BreakTableColumns.LotID).Value = newLotID newLotID = "xxxx" ' force search on date Case Else If Right(newLotID, 1) = "+" Then newLotID = Left(newLotID, Len(newLotID) - 1) End If End Select ' Prepare to search for match on LotID ' search 'up' or 'down' depending on new date currentDate = Cells(currentRow, BreakTableColumns.SampleDate).Value Select Case DateDiff("d", currentDate, newDate) Case Is < 0 ' newDate is older SearchDirection = xlPrevious Case Is = 0 ' newDate is newer SearchDirection = xlNext End Select ' define search range Set r = Range(Cells(SearchBegin, BreakTableColumns.LotID), _ Cells(SearchEnd, BreakTableColumns.LotID)) Set c = Cells(currentRow, BreakTableColumns.LotID) Set c = r.Find(What:=newLotID, After:=c, _ LookIn:=xlValues, LookAt:=xlPart, _ SearchOrder:=xlByRows, _ SearchDirection:=SearchDirection, _ MatchCase:=False, SearchFormat:=False) If c Is Nothing Then 'lot not found! -- look for date Application.Speech.Speak "Lot not found", -1, 0, -1 For x2 = SearchBegin To SearchEnd thisDate = Cells(x2, BreakTableColumns.SampleDate).Value If IsEmpty(thisDate) Then Exit For Select Case DateDiff("d", thisDate, newDate) Case Is < 0 ' thisDate is future Exit For Case 0 ' same date If DatePart("h", thisDate) = 0 Then Exit For ' No Sample time (MCSS) so stop here End If ' else ignore time; continue search End Select Next x2 ' SearchBegin To SearchEnd Set c = Cells(x2, BreakTableColumns.SampleDate) End If ' c Is Nothing Then lot not found! -- look for date With c ' scroll to selected row .Show .EntireRow.Select End With Select Case newPlant Case "MCSS" 'Set MCSS Summary not printed flag flagMCSS = True With Selection.Cells(1, 5).Interior .ColorIndex = 6 .Pattern = xlSolid End With With Cells(4, 5) .Value = "MCSS Summary not printed" With .Interior .ColorIndex = 6 .Pattern = xlSolid End With End With End Select End If '' restore screen updates 'put this at the end of your code With Application .Calculation = calcState ' cancels cut/copy mode .EnableEvents = eventsState .DisplayStatusBar = statusBarState .ScreenUpdating = screenUpdateState End With If Not c Is Nothing Then Rows(newRow).Copy End If Set c = Nothing Set r = Nothing End Sub -- Clif McIrvin (clare reads his mail with moe, nomail feeds the bit bucket :-) |
#3
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Tue, 11 Jan 2011 13:49:34 -0600, "Clif McIrvin"
wrote: "Henrietta Horne" wrote in message .. . Since you guys were so helpful, I hate to wear out my welcome, but... <snip Can I impose on someone to get this started? The parts I am not sure how to do inside a UDF a 1. How to step through the rows and address the cells (relatively). The working procedure I'm posting below might give you some ideas to work with, even though I'm making no attempt to apply it to your question <grin [since you show a desire to learn! :) ] 2. How to delete a row. one possibility: Range("A5").Select Selection.EntireRow.Delete (code the result of recording a macro while manually selecting a cell and using the UI to delete the row) ========== begin code Sub Copy2BreakTable() ' 1/07/11 cm remove auto-filter; use range for find ' 1/25/10 cm add auto-filter on Plant ID ... partially tested ' 08 29 09 cm disable screen updates during startup Const newRow = 2 ' New Data row Const lookRows = 300 ' Only search newest rows Dim ErrNum As Variant Dim newDate, thisDate, currentDate Dim newPlant, newLotID Dim x2, x$, x1$ Dim SearchBegin, SearchEnd, SearchDirection Dim firstRow, lastRow, currentRow ' cylinder data range 'Dim bPlantFilter As Boolean '' turn off slow screen updating during searching ' Application.ScreenUpdating = False Dim screenUpdateState As Boolean Dim statusBarState As Boolean Dim calcState As XlCalculation Dim eventsState As Boolean Dim displayPageBreakState As Boolean With Application screenUpdateState = .ScreenUpdating statusBarState = .DisplayStatusBar calcState = .Calculation eventsState = .EnableEvents End With 'turn off some Excel functionality so your code runs faster With Application .ScreenUpdating = False .DisplayStatusBar = False .Calculation = xlCalculationManual .EnableEvents = False End With Set c = Nothing ' insure exit test is valid If newSub() Then ' newSub returns False if activesheet not valid cylData ' copies new cylinder measurements to row newRow; ' returns with BreakTable!Data as ActiveSheet ' Look in last lookRows rows for match in LotID ' Begin search at current row; look up or down depending ' on date comparison; begin at last row if current row ' is outside of lookRows range ' gather data of interest... newPlant = Cells(newRow, BreakTableColumns.Plant).Value newLotID = Cells(newRow, BreakTableColumns.LotID).Value newDate = Cells(newRow, BreakTableColumns.SampleDate).Value Set r = Range("cylData") ' all cylinder data firstRow = r.Row lastRow = r.Rows.Count + firstRow - 1 'xx' marker row SearchEnd = lastRow SearchBegin = lastRow - lookRows If SearchBegin < firstRow Then SearchBegin = firstRow End If currentRow = Selection.Row If currentRow < SearchBegin Or _ currentRow = SearchEnd Then ' search from the bottom up - ' new break data tends to be recent currentRow = SearchEnd End If Select Case newPlant Case "MCSS" Case "LCP" ' special processing for LCP worksheet lotID If Mid(newLotID, 7, 1) = "+" Then newLotID = Left(newLotID, 7) ElseIf Right(newLotID, 2) = "++" Then newLotID = Left(newLotID, 6) & "++" Else newLotID = Left(newLotID, 6) End If Cells(newRow, BreakTableColumns.LotID).Value = newLotID newLotID = "xxxx" ' force search on date Case Else If Right(newLotID, 1) = "+" Then newLotID = Left(newLotID, Len(newLotID) - 1) End If End Select ' Prepare to search for match on LotID ' search 'up' or 'down' depending on new date currentDate = Cells(currentRow, BreakTableColumns.SampleDate).Value Select Case DateDiff("d", currentDate, newDate) Case Is < 0 ' newDate is older SearchDirection = xlPrevious Case Is = 0 ' newDate is newer SearchDirection = xlNext End Select ' define search range Set r = Range(Cells(SearchBegin, BreakTableColumns.LotID), _ Cells(SearchEnd, BreakTableColumns.LotID)) Set c = Cells(currentRow, BreakTableColumns.LotID) Set c = r.Find(What:=newLotID, After:=c, _ LookIn:=xlValues, LookAt:=xlPart, _ SearchOrder:=xlByRows, _ SearchDirection:=SearchDirection, _ MatchCase:=False, SearchFormat:=False) If c Is Nothing Then 'lot not found! -- look for date Application.Speech.Speak "Lot not found", -1, 0, -1 For x2 = SearchBegin To SearchEnd thisDate = Cells(x2, BreakTableColumns.SampleDate).Value If IsEmpty(thisDate) Then Exit For Select Case DateDiff("d", thisDate, newDate) Case Is < 0 ' thisDate is future Exit For Case 0 ' same date If DatePart("h", thisDate) = 0 Then Exit For ' No Sample time (MCSS) so stop here End If ' else ignore time; continue search End Select Next x2 ' SearchBegin To SearchEnd Set c = Cells(x2, BreakTableColumns.SampleDate) End If ' c Is Nothing Then lot not found! -- look for date With c ' scroll to selected row .Show .EntireRow.Select End With Select Case newPlant Case "MCSS" 'Set MCSS Summary not printed flag flagMCSS = True With Selection.Cells(1, 5).Interior .ColorIndex = 6 .Pattern = xlSolid End With With Cells(4, 5) .Value = "MCSS Summary not printed" With .Interior .ColorIndex = 6 .Pattern = xlSolid End With End With End Select End If '' restore screen updates 'put this at the end of your code With Application .Calculation = calcState ' cancels cut/copy mode .EnableEvents = eventsState .DisplayStatusBar = statusBarState .ScreenUpdating = screenUpdateState End With If Not c Is Nothing Then Rows(newRow).Copy End If Set c = Nothing Set r = Nothing End Sub Wow. I'll need a few days to digest that. Did you just type that up on the fly or did you already have some similar code? |
#4
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
"Henrietta Horne" wrote in message
... On Tue, 11 Jan 2011 13:49:34 -0600, "Clif McIrvin" wrote: "Henrietta Horne" wrote in message . .. Since you guys were so helpful, I hate to wear out my welcome, but... <snip Can I impose on someone to get this started? <big snip Wow. I'll need a few days to digest that. Did you just type that up on the fly or did you already have some similar code? It happens that I'd just done some fine-tuning on that routine to speed it up, so I just opened the module and did a copy / paste for you <g. Feel free to come back with questions ... I didn't really offer any explanation. That code is taking new data that has been placed into row 2 and finding the correct place in the table for that new data. When originally written, it was based on a recorded macro and used lots of ..Select and .Activate ... and as the table grew it got slower and slower. Now, it finishes in sub-second times. Oh ... that routine was one of my very first ventures into VBA, and what it is now reflects a lot of what I've learned in the three years since. Have fun with it! -- Clif McIrvin (clare reads his mail with moe, nomail feeds the bit bucket :-) |
#5
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Jan 11, 1:12*pm, Henrietta Horne wrote:
Since you guys were so helpful, I hate to wear out my welcome, but... Now that I have the high-index data from my previous question, I need to process the list of words to eliminate duplicates. The table below shows some sample data. The first column is the worksheet row numbers. The Word column (B) contains the words. The Score column (C) contains a count of how often that words occurs is a large collection of text (400 million words). The Index column (D) contains the result of the GetHighIndex UDF from my previous post. I need another UDF to walk down the list looking for duplicate words (tee, a, in). When it finds a duplicate, it will merge the two rows into one by adding the Scores and deleting the second row. In the example below, rows 5 & 6 would be combined to form one row with a Score of 3301 (199 + 3102). Rows 62-65 would be combined into one row with a Score of 7,140,219 (808 + 9,711 + 279,364 + 6,850,336). Only exact matches are combined. The two "balloon" rows would be combined as would the two "ballooning" rows, but would not include the "balloonist" row. Can I impose on someone to get this started? The parts I am not sure how to do inside a UDF a 1. How to step through the rows and address the cells (relatively). 2. How to delete a row. * * * * * *B * * * * C * * *D * * 4 * *Word * * *Score *Index * * 5 *tee * * * * * 199 * * 2 * * 6 *tee * * * * 3,102 * * 2 * * 7 *a * * * * * * 298 * * 3 * * 8 *a * * * 9,996,626 * * 3 * * 9 *at * * *1,730,609 * * 3 * *10 *eat * * * *69,484 * * 3 * *62 *in * * * * * *808 * * 6 * *63 *in * * * * *9,711 * * 6 * *64 *in * * * *279,364 * * 6 * *65 *in * * *6,850,336 * * 6 *2054 *dear * * * * * 97 * *11 *2055 *dear * * * *2,015 * *11 *2056 *dear * * * *3,364 * *11 *2057 *dear * * * *8,417 * *11 32159 *balloon * * * 536 * *20 32160 *balloon * * 4,887 * *20 32161 *ballooning * * 28 * *20 32162 *ballooning * * 82 * *20 32163 *balloonist * * 51 * *20 48196 *turquoise * * 435 * *26 48197 *turquoise * * 718 * *26 49270 *ad-lib * * * * 29 * *27 49271 *ad-lib * * * * 46 * *27 49272 *ad-libbed * * *40 * *27 49273 *ad-libbing * * 19 * *27 Trying to copy your data for testing didn't work so what you need is a looping macro from the bottom up to do this. "If desired, send your file to dguillett @gmail.com I will only look if: 1. You send a copy of this message on an inserted sheet 2. You give me the newsgroup and the subject line 3. You send a clear explanation of what you want 4. You send before/after examples and expected results." |
#6
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Tue, 11 Jan 2011 15:12:06 -0800 (PST), Don Guillett Excel MVP
wrote: On Jan 11, 1:12*pm, Henrietta Horne wrote: Since you guys were so helpful, I hate to wear out my welcome, but... Now that I have the high-index data from my previous question, I need to process the list of words to eliminate duplicates. The table below shows some sample data. The first column is the worksheet row numbers. The Word column (B) contains the words. The Score column (C) contains a count of how often that words occurs is a large collection of text (400 million words). The Index column (D) contains the result of the GetHighIndex UDF from my previous post. I need another UDF to walk down the list looking for duplicate words (tee, a, in). When it finds a duplicate, it will merge the two rows into one by adding the Scores and deleting the second row. In the example below, rows 5 & 6 would be combined to form one row with a Score of 3301 (199 + 3102). Rows 62-65 would be combined into one row with a Score of 7,140,219 (808 + 9,711 + 279,364 + 6,850,336). Only exact matches are combined. The two "balloon" rows would be combined as would the two "ballooning" rows, but would not include the "balloonist" row. Can I impose on someone to get this started? The parts I am not sure how to do inside a UDF a 1. How to step through the rows and address the cells (relatively). 2. How to delete a row. * * * * * *B * * * * C * * *D * * 4 * *Word * * *Score *Index * * 5 *tee * * * * * 199 * * 2 * * 6 *tee * * * * 3,102 * * 2 * * 7 *a * * * * * * 298 * * 3 * * 8 *a * * * 9,996,626 * * 3 * * 9 *at * * *1,730,609 * * 3 * *10 *eat * * * *69,484 * * 3 * *62 *in * * * * * *808 * * 6 * *63 *in * * * * *9,711 * * 6 * *64 *in * * * *279,364 * * 6 * *65 *in * * *6,850,336 * * 6 *2054 *dear * * * * * 97 * *11 *2055 *dear * * * *2,015 * *11 *2056 *dear * * * *3,364 * *11 *2057 *dear * * * *8,417 * *11 32159 *balloon * * * 536 * *20 32160 *balloon * * 4,887 * *20 32161 *ballooning * * 28 * *20 32162 *ballooning * * 82 * *20 32163 *balloonist * * 51 * *20 48196 *turquoise * * 435 * *26 48197 *turquoise * * 718 * *26 49270 *ad-lib * * * * 29 * *27 49271 *ad-lib * * * * 46 * *27 49272 *ad-libbed * * *40 * *27 49273 *ad-libbing * * 19 * *27 Trying to copy your data for testing didn't work so what you need is a looping macro from the bottom up to do this. "If desired, send your file to dguillett @gmail.com I will only look if: 1. You send a copy of this message on an inserted sheet 2. You give me the newsgroup and the subject line 3. You send a clear explanation of what you want 4. You send before/after examples and expected results." I'll try it myself first. I (when) I fail, I may send it over. Thanks. |
#8
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Wed, 12 Jan 2011 03:00:53 +0100, "Charabeuh"
wrote: Hello, Another way without VBA: I assumed your data are in columns A,B,C (e.g. Word,Score,Index <== A1 to C10000) Select an empty cell on your sheet ( e.g. cell F1) Select in your menu 'Data' the command 'Consolidate' Into the windows: - select in the list function "Sum" - select in the textbox "Référence" the area of your data to sum (i.e. A1:B10000) - click 'Add' - then select the option 'Top row' and 'Left column') - clic 'OK' You will get a table with the sum of score. Into cell F2, put the following formula to get your index: =MAX(FIND(MID(F2,ROW(INDIRECT("1:" & LEN(F2))),1),"etaoinsrhldcumgfpwybvkjxzq-")) (this is an array formula, you should validate this formula with the three keys CTRL+SHIFT+Enter instead of the single key Enter) Then drag down this formula to the end of your consolidated data. Thanks. I'll look into this solution, too. |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Merge 2 excel worksheets into one and remove duplicate entries. | Excel Worksheet Functions | |||
Mail Merge - file has duplicate data | Excel Discussion (Misc queries) | |||
Select and merge cells with duplicate values | Excel Programming | |||
How to create duplicate labels with mail merge? | Excel Discussion (Misc queries) | |||
deleting duplicate records in a mail merge | Excel Discussion (Misc queries) |