![]() |
Random selection of records based on specifc value
I would like to create sample files based on the total number of
unique values in column A. I use a vlookup value to calculate the required sample records I need. Suppose I have column A containing the following information: apples peers oranges apples peers etc etc. Then I calculate the unique values: apples = 200 peers = 450 oranges = 50 then I calculate: total samples needed = 100 then I calculate the samples needed based on the 'weight': apples = (200 / 700) * 100 = 29 peers = (450 / 700) * 100 = 64 oranges = (50 / 700) * 100 = 7 Here comes the problem: Through vba I automatically want to select 29 random 'apples' rows, 64 random peers rows and 7 random oranges rows and copy them to sheet2. Mark |
Random selection of records based on specifc value
"Ixtreme" wrote:
Through vba I automatically want to select 29 random 'apples' rows, 64 random peers rows and 7 random oranges rows and copy them to sheet2. Perhaps you can make use of the following, at least as a start. This code assumes that each row of data consists of 3 columns: A, B and C. And there are 1000 rows of data starting in row 2. Note: Although the selection of each apple, orange and pear row is random, the output consists of all apple samples, then all orange samples, then all pear samples. If you want those shuffled, some additional work is needed. ----- Option Explicit Sub genData() Const nSample As Long = 100 Dim s(1 To nSample, 1 To 3), v Dim n As Long, i As Long Dim nApple As Long, nOrange As Long, nPear As Long Dim sApple As Long, sOrange As Long, sPear As Long ' input data v = Sheets("Sheet1").Range("a2:c1001") n = UBound(v, 1) ReDim apple(1 To n) As Long ReDim orange(1 To n) As Long ReDim pear(1 To n) As Long nApple = 0: nOrange = 0: nPear = 0 For i = 1 To n Select Case v(i, 1) Case "apple" nApple = nApple + 1 apple(nApple) = i Case "orange" nOrange = nOrange + 1 orange(nOrange) = i Case "pear" nPear = nPear + 1 pear(nPear) = i End Select Next ' select random data If nSample nApple + nOrange + nPear Then MsgBox "error" Exit Sub End If Randomize sApple = Int(nApple / n * nSample) sOrange = Int(nOrange / n * nSample) sPear = nSample - sApple - sOrange doSelect sApple, apple, nApple, v, s, 0 doSelect sOrange, orange, nOrange, v, s, sApple doSelect sPear, pear, nPear, v, s, sApple + sOrange Sheets("sheet2").Range("a1", "c" & nSample) = s End Sub Private Sub doSelect(ByVal nSample As Long, myRow0() As Long, _ ByVal nRow As Long, v, s(), ByVal i As Long) Dim j As Long, x As Long, r As Long ReDim myRow(1 To nRow) As Long If nSample <= 0 Then Exit Sub For j = 1 To nRow: myRow(j) = myRow0(j): Next Do x = 1 + Int(nSample * Rnd) r = myRow(x) i = i + 1 s(i, 1) = v(r, 1) s(i, 2) = v(r, 2) s(i, 3) = v(r, 3) If x < nSample Then myRow(x) = myRow(nSample) nSample = nSample - 1 Loop Until nSample = 0 End Sub |
Random selection of records based on specifc value
Thanks for your answer. It is not exactly what I meant. I did some
clean up and think that I found a better setup. I have now 2 sheets. Sheet 'Data" which contains all my data where the values in column [A] are the record id's (the apples, pears, oranges etc etc). On sheet "Stats" I have a few formulas that show me: A The total rows used in Sheets "Data" B A list of unique values from column [A] of the "Data" sheet. Can be any number. C A required sample needed for each item listed in [C] A B C 200 rows apple 10 pear 20 .... .... etc etc The Data sheet looks like this apple bla bla test test orange sdjsdjh sdkj apple dfjkdf dfkjdf pear jsdkjs lkslksd ..... etc etc What I would like is: randomly select (the required number of sampes [C]) rows from the "Data" sheet for each unique value I have listed in column [b]. So I need some for each loop I quess that first checks how many unique items there are and then copy the correct number of required samples from the "Data" sheet into the "Stats" sheet or a new sheet. |
Random selection of records based on specifc value
"Ixtreme" wrote:
On sheet "Stats" I have a few formulas that show me: A The total rows used in Sheets "Data" B A list of unique values from column [A] of the "Data" sheet. Can be any number. C A required sample needed for each item listed in [C] [....] The Data sheet looks like this apple bla bla test test [....] What I would like is: randomly select (the required number of sampes [C]) rows from the "Data" sheet for each unique value I have listed in column [b]. Try the following macro. Make any appropriate changes to cell references. The macro assumes that Data!A:A has at least one empty cell after the last valid column. Otherwise, Stats!A2 should contain the number of valid data columns, just as Stats!A1 contains the number of valid data rows. It also assumes that the list of key (what you variously call "record ids" or "unique values") is followed by at least one empty cell, or a cell with an error, or a cell with the null string. That should give some flexibility in how you create the list of keys. The random results are copied starting in A1 in a new worksheet inserted before the "Data" worksheet. The macro makes no effort to copy formats or adjust column widths in the new worksheet. ----- Option Explicit Sub genSample() Dim dSheet As Worksheet, sSheet As Worksheet Dim nRow As Long, nCol As Long, nKey As Long Dim nSelect As Long Dim i As Long, d As Long, k As Long, r As Long Dim c As Long Dim data, myKey ' input data and stats Set dSheet = Sheets("Data") Set sSheet = Sheets("Stats") nRow = sSheet.Range("a1") With dSheet nCol = .Range("a1").End(xlToRight).Column data = .Range("a1").Resize(nRow, nCol) End With With sSheet nKey = .Range("b1").End(xlDown).Row myKey = .Range("b1").Resize(nKey, 2) For i = 1 To nKey If IsError(myKey(i, 1)) Then Exit For If myKey(i, 1) = "" Then Exit For nSelect = nSelect + myKey(i, 2) Next nKey = i - 1 End With If nSelect = 0 Or nSelect nRow Then MsgBox "error 1": Exit Sub End If ' generate random results ReDim res(1 To nSelect, 1 To nCol) Randomize For i = 1 To nSelect d = 1 + Int(nRow * Rnd) For k = 1 To nKey If myKey(k, 1) = data(d, 1) Then Exit For Next If k <= nKey Then If myKey(k, 2) 0 Then r = r + 1 For c = 1 To nCol res(r, c) = data(d, c) Next myKey(k, 2) = myKey(k, 2) - 1 End If End If If d < nRow Then For c = 1 To nCol data(d, c) = data(nRow, c) Next End If nRow = nRow - 1 Next ' write results to new worksheet dSheet.Select Sheets.Add Range("a1").Resize(nSelect, nCol) = res End Sub |
Random selection of records based on specifc value
Errata....
"joeu2004" wrote: For i = 1 To nSelect Sorry. I failed to notice that the loop does not always generate the correct number of samples. See the attached macro. The for-loop is replaced by a do-loop. ----- Option Explicit Sub genSample() Dim dSheet As Worksheet, sSheet As Worksheet Dim nRow As Long, nCol As Long, nKey As Long Dim nSelect As Long Dim i As Long, d As Long, k As Long, r As Long Dim c As Long Dim data, myKey ' input data and stats Set dSheet = Sheets("Data") Set sSheet = Sheets("Stats") nRow = sSheet.Range("a1") With dSheet nCol = .Range("a1").End(xlToRight).Column data = .Range("a1").Resize(nRow, nCol) End With With sSheet nKey = .Range("b1").End(xlDown).Row myKey = .Range("b1").Resize(nKey, 2) nSelect = 0 For i = 1 To nKey If IsError(myKey(i, 1)) Then Exit For If myKey(i, 1) = "" Then Exit For nSelect = nSelect + myKey(i, 2) Next nKey = i - 1 End With If nSelect = 0 Or nSelect nRow Then MsgBox "error 1": Exit Sub End If ' generate random results ReDim res(1 To nSelect, 1 To nCol) Randomize r = 0 Do d = 1 + Int(nRow * Rnd) For k = 1 To nKey If myKey(k, 1) = data(d, 1) Then Exit For Next If k <= nKey Then If myKey(k, 2) 0 Then r = r + 1 For c = 1 To nCol res(r, c) = data(d, c) Next myKey(k, 2) = myKey(k, 2) - 1 If r = nSelect Then Exit Do End If End If If d < nRow Then For c = 1 To nCol data(d, c) = data(nRow, c) Next End If nRow = nRow - 1 Loop ' write results to new worksheet dSheet.Select Sheets.Add Range("a1").Resize(nSelect, nCol) = res End Sub |
Random selection of records based on specifc value
Perfect!
Thanks a lot. This is exactly what I needed. |
All times are GMT +1. The time now is 07:16 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com