Random selection of records based on specifc value
"Ixtreme" wrote:
On sheet "Stats" I have a few formulas that show me:
A The total rows used in Sheets "Data"
B A list of unique values from column [A] of the
"Data" sheet. Can be any number.
C A required sample needed for each item listed in [C]
[....]
The Data sheet looks like this
apple bla bla test test
[....]
What I would like is: randomly select (the required
number of sampes [C]) rows from the "Data" sheet for
each unique value I have listed in column [b].
Try the following macro. Make any appropriate changes to cell references.
The macro assumes that Data!A:A has at least one empty cell after the last
valid column. Otherwise, Stats!A2 should contain the number of valid data
columns, just as Stats!A1 contains the number of valid data rows.
It also assumes that the list of key (what you variously call "record ids"
or "unique values") is followed by at least one empty cell, or a cell with
an error, or a cell with the null string. That should give some flexibility
in how you create the list of keys.
The random results are copied starting in A1 in a new worksheet inserted
before the "Data" worksheet. The macro makes no effort to copy formats or
adjust column widths in the new worksheet.
-----
Option Explicit
Sub genSample()
Dim dSheet As Worksheet, sSheet As Worksheet
Dim nRow As Long, nCol As Long, nKey As Long
Dim nSelect As Long
Dim i As Long, d As Long, k As Long, r As Long
Dim c As Long
Dim data, myKey
' input data and stats
Set dSheet = Sheets("Data")
Set sSheet = Sheets("Stats")
nRow = sSheet.Range("a1")
With dSheet
nCol = .Range("a1").End(xlToRight).Column
data = .Range("a1").Resize(nRow, nCol)
End With
With sSheet
nKey = .Range("b1").End(xlDown).Row
myKey = .Range("b1").Resize(nKey, 2)
For i = 1 To nKey
If IsError(myKey(i, 1)) Then Exit For
If myKey(i, 1) = "" Then Exit For
nSelect = nSelect + myKey(i, 2)
Next
nKey = i - 1
End With
If nSelect = 0 Or nSelect nRow Then
MsgBox "error 1": Exit Sub
End If
' generate random results
ReDim res(1 To nSelect, 1 To nCol)
Randomize
For i = 1 To nSelect
d = 1 + Int(nRow * Rnd)
For k = 1 To nKey
If myKey(k, 1) = data(d, 1) Then Exit For
Next
If k <= nKey Then
If myKey(k, 2) 0 Then
r = r + 1
For c = 1 To nCol
res(r, c) = data(d, c)
Next
myKey(k, 2) = myKey(k, 2) - 1
End If
End If
If d < nRow Then
For c = 1 To nCol
data(d, c) = data(nRow, c)
Next
End If
nRow = nRow - 1
Next
' write results to new worksheet
dSheet.Select
Sheets.Add
Range("a1").Resize(nSelect, nCol) = res
End Sub
|