Home |
Search |
Today's Posts |
#1
![]()
Posted to microsoft.public.excel.misc,microsoft.public.excel.programming
|
|||
|
|||
![]()
I have a data extraction / cleansing problem.
A computer help desk system has ticket information with textual detail placed in a field called 'Comments'. This data has been extracted to Excel spreadsheets for the last few months. I cannot know exactly where the 'Comments' column may be in each spreadsheet, however I know that the ending column location for each row is 'BA'. We need to extract phone numbers and extensions out of this data. Phone numbers can be 10 digit telephone numbers such as: 123-123-1234 or 1231231234. The 10 digit telephone numbers can have 4 or 5 digit extensions (or PBX stations) such as 12345 or 1234. The telephone combination at times can be put together such as 1231231234/12345. Phone numbers can also be 7 digit (local) numbers such as 1231234 or 123-1234. This information is intermixed with text and I need to extract the phone numbers out of the Comments. Comments are like this: "Joe Blow 1231231234/12345 needs to have his extension moved to new desk. Call site manager Jack Black 1231232345/09876 for access." * I need to pull out "1231231234", "12345", "1231232345" and "09876" from the above and put the extracted data at the end of the spreadsheet row. Given the end of the row is at BA, and extracted numbers from the above, put "1231231234" in that row's BB column, "12345" in BC column, "1231232345" in BD column and "09876" in BE column. "Add VM 1234, updated dictionary, added 2345 & 2346. Completed by Jay Smith." * I need to pull out "1234", "2345" and "2346" and put "1234" in BB column, "2345" in BC column and "2346" in BD column. "AS000AD00SK000DD000R0E0R0WQ0E0D0D0 Jane Smith 4441231237. Setup vmail for ext 1237. Added to call pickup group as 4948." * I need to pull out "4441231237", "1237" and "2346" and put "4441231237" in BB column, "1237" in BC column and "2346" in BD column. "4441234567 thru 7890 have been added to VNET Thanks!" * I need to pull out "4441234567" and "7890" and put "4441234567" in BB column and "7890" in BC column. I have found a VBA utility that can pull numbers out of a string from http://www.google.com/groups?hl=en&l...wsranger.c om (or http://www.mvps.org/dmcritchie/excel..._digitsid.htm), but that only returns one number. I was thinking about adding a parent function which would get the column, tokenize the contents of 'Comments' on space, check the length of the tokenized sting content length 3 and if it is pass the tokenized sting content to the above routine. If it returns a good number do some comparisons to be sure it passes the above business rules and put the results in the end of that row. Being a good reuse programmer, I was wondering if anyone has something that does something kind-of similar to what I need before I start on my coding journey. Please reply to the group, so that other people can search on this and (hopefully) find a good answer. Thanks, Keith |
#2
![]()
Posted to microsoft.public.excel.misc,microsoft.public.excel.programming
|
|||
|
|||
![]()
kkknie wrote in message ...
Keith, I was a bit bored with my lunch so I worked this routine to do the data extraction. <Snip kkknie, Thanks for your bored time! I will try this out tomorrow, got caught up in one of those fires now (don't you love those things!). I really appreciate your effort! Thanks, Keith |
#3
![]()
Posted to microsoft.public.excel.misc,microsoft.public.excel.programming
|
|||
|
|||
![]()
Just in case someone needs it (including myself), here is the answer
that I came up with. It may be able to be done in a more efficent way, but the important part is that it works :-) - K Sub do_all_sheets() Dim i As Integer Dim numSheets As Integer numSheets = Application.Sheets.Count For i = 1 To numSheets On Error Resume Next If Worksheets(i).Visible = True Then Worksheets(i).Select On Error GoTo 0 do_sheet End If Next End Sub Sub do_sheet() Dim r As Range Dim r2 As Range Dim i As Integer Dim intLen As Integer Dim iCol As Integer Dim intCurCol As Integer Dim lastRow As Long Dim numRangeRows As Long Dim loopRangeCounter As Long Dim intCurRow As Integer Dim s() As String Dim lastCol As Long Dim strCellValue As String Dim strMaxCellValue As String Dim strCommentRangeName As String ' ' see if we have a comments column ' Rows("1:1").Select Range("A1").Activate On Error Resume Next Selection.Find(What:="comments", After:=ActiveCell, LookIn:=xlFormulas, _ LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _ MatchCase:=False, SearchFormat:=False).Activate On Error GoTo 0 ' ' if we find the comments, good. If not, exit. ' If UCase(Application.ActiveCell.Value) < "COMMENTS" Then Exit Sub End If ' ' looks like we have a comments column, get the column name ' 'strCommentsColName = "AI" strCommentsColName = getCellColumnFromAddress(Application.ActiveCell.Ad dress) ' ' go the last column / row on the sheet ' Cells.Find(What:="*", After:=[A1], SearchDirection:=xlPrevious).Select strMaxCellValue = Application.ActiveCell.Address(False, False) lastRow = getCellRowNumberFromAddress(strMaxCellValue) 'lastCol = getCellColumnFromAddress(strMaxCellValue) 'lastRow = getCellRowNumberFromAddress( _ ' Application.ActiveSheet.Cells.SpecialCells(xlCellT ypeLastCell).Row) lastCol = getLastColumnFromSheet(lastRow) ' ' set range name of "Comments" row: 2 to the End (whatever that is) ' strCommentRangeName = strCommentsColName & "2:" & _ strCommentsColName & CStr(lastRow) 'Debug.Print "Comments range: " & strCommentRangeName Set r = Range(strCommentRangeName) ' ' check to see if we actually have any data in the "comment" column, ' if not we exit. ' If IsEmpty(r.Cells.Value2) Then Exit Sub End If numRangeRows = r.Count ' ' check the column, we normally put things in last col,, if there ' is anything in the worksheet in a further column, we don't want to ' overwrite the contents ' iCol = 55 If lastCol + 1 iCol Then iCol = lastCol + 1 End If For loopRangeCounter = 1 To numRangeRows ReDim s(0) ' ' strCellValue = r.FormulaR1C1(loopRangeCounter, 1) Call ExtractIt(strCellValue, s()) intCurRow = r.Row - 1 + loopRangeCounter intCurCol = iCol If s(0) < "NONE" Then For i = 0 To UBound(s) Cells(intCurRow, intCurCol).NumberFormat = "@" Cells(intCurRow, intCurCol).Font.Name = "Arial" Cells(intCurRow, intCurCol).Font.Size = 9 Cells(intCurRow, intCurCol).Value = s(i) intCurCol = intCurCol + 1 Next End If Next End Sub Sub ExtractIt(strCellValue As String, sOutput() As String) Dim iMin As Long Dim iMax As Long Dim iFound As Long Dim i As Long Dim lngStrLen As Long Dim strChar As String Dim incomingLine As String Dim tempStr As String Dim tempStrLen As String Dim r As Range Dim iLoop As Integer Dim b As Boolean Dim bBadPrefix As Boolean Dim bAtEndOfString As Boolean Dim bLengthTooShort As Boolean Dim bIsADate As Boolean Dim bDashCheck As Boolean Dim bSlashCheck As Boolean Dim bAddCell As Boolean ReDim Preserve sOutput(0) incomingLine = strCellValue sOutput(0) = "NONE" iLoop = 0 ' loop thru the entire line Do Until incomingLine = "" iMin = 99999 ' looking for a number bit For i = 0 To 9 iFound = InStr(1, incomingLine, i) If iFound 0 And iFound < iMin Then iMin = iFound End If Next If iMin = 99999 Then Exit Do End If lngStrLen = Len(incomingLine) bAtEndOfString = False iMax = -1 ' check number portion of the string For i = iMin To Len(incomingLine) strChar = Mid(incomingLine, i, 1) Select Case strChar Case 1 To 9, 0, "-", "/" b = False Case Else b = True iMax = i Exit For End Select ' ' check to see if we have a good number at the end of the line ' If lngStrLen = i And b = False Then bAtEndOfString = True Exit For End If Next If iMax -1 Or bAtEndOfString = True Then ReDim Preserve sOutput(iLoop) If bAtEndOfString = True Then tempStr = Mid(incomingLine, iMin) incomingLine = "" iMax = Len(tempStr) Else tempStr = Mid(incomingLine, iMin, iMax - iMin) incomingLine = Right(incomingLine, Len(incomingLine) - iMax + 1) End If tempStrLen = Len(tempStr) bAddCell = True bLengthTooShort = False bIsADate = False bDashCheck = False bHasSlashes = False ' ' check to see if there is more than one slash ' bHasSlashes = checkMoreThanOneSlash(tempStr) If bHasSlashes = False Then ' ' length < minimum ' If tempStrLen < 4 Then bLengthTooShort = True Else ' ' check to see if there is a date x/xx or xx/xx ' If tempStrLen < 6 Then bIsADate = checkOneSlashDate(tempStr) End If If bIsADate = False Then ' ' check to see if there is a dash on a short string ' "-xxx" or "-xxxx" ' If tempStrLen < 6 Then bDashCheck = checkOneDashCheck(tempStr) End If End If End If End If If bHasSlashes = True _ Or bBadPrefix = True _ Or bIsADate = True _ Or bDashCheck = True _ Or bLengthTooShort = True Then bAddCell = False End If ' ' if all is ok, write it out ' If bAddCell = True Then sOutput(iLoop) = tempStr iLoop = iLoop + 1 End If Else b = False End If Loop End Sub Function checkMoreThanOneSlash(incomingLine As String) As Boolean ' ' check to see if this string is a date-like field 11/11/11 or 11/11/1111 ' Dim varPosition As Variant Dim firstPos As Integer intPosition = -1 checkMoreThanOneSlash = False varPosition = InStr(1, incomingLine, "/", vbTextCompare) If IsNull(varPosition) = True Then Else If varPosition 0 Then firstPos = varPosition + 1 varPosition = InStr(firstPos, incomingLine, "/", vbTextCompare) If IsNull(varPosition) = True Then Else If varPosition 0 Then checkMoreThanOneSlash = True End If End If End If End If End Function Function getLastColumnFromSheet(lastRowNumber As Long) As Long ' ' ' Dim strChar As String Dim rowNumber As String Dim maxColumn As Long rowNumber = "" maxColumn = -1 Application.ScreenUpdating = False For i = lastRowNumber + 1 To 2 Step -1 Application.Cells(i + 1, 1).Select Application.ActiveSheet.Cells.Find(What:="*", After:=ActiveCell, SearchDirection:=xlPrevious).Select If Application.ActiveCell.Column maxColumn Then maxColumn = Application.ActiveCell.Column End If Next Application.ScreenUpdating = True getLastColumnFromSheet = maxColumn End Function Function getCellRowNumberFromAddress(cellName As String) As Long Dim strChar As String Dim rowNumber As String rowNumber = "" For i = 1 To Len(cellName) strChar = Mid(cellName, i, 1) Select Case strChar Case 1 To 9, 0 rowNumber = rowNumber & strChar Case Else End Select Next getCellRowNumberFromAddress = CLng(rowNumber) End Function Function getCellColumnFromAddress(cellAddress As String) As String Dim strChar As String Dim columnName As String rowNumber = "" For i = 1 To Len(cellAddress) strChar = Mid(cellAddress, i, 1) Select Case strChar Case "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", _ "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", _ "U", "V", "W", "X", "Y", "Z", _ "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", _ "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", _ "u", "v", "w", "x", "y", "z" columnName = columnName & strChar Case Else End Select Next getCellColumnFromAddress = columnName End Function Function checkOneSlashDate(incomingLine As String) As Boolean ' ' check to see if this string is a date-like field 11/11 or 1/11 ' Dim varPosition As Variant checkOneSlashDate = False varPosition = InStr(1, incomingLine, "/", vbTextCompare) If IsNull(varPosition) = True Then Else If varPosition 0 Then checkOneSlashDate = True End If End If End Function Function checkOneDashCheck(incomingLine As String) As Boolean ' ' check to see if this string is a short dash like "xxx-" or "-xxx" ' Dim varPosition As Variant checkOneDashCheck = False varPosition = InStr(1, incomingLine, "-", vbTextCompare) If IsNull(varPosition) = True Then Else If varPosition 0 Then checkOneDashCheck = True End If End If End Function |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Converting A String to a Column of Data | Excel Worksheet Functions | |||
Cleansing/Data Comparison between spreadsheets | Excel Worksheet Functions | |||
Text string extraction | Excel Worksheet Functions | |||
String Extraction... | Excel Worksheet Functions | |||
Cleansing for CRM uploads | Excel Worksheet Functions |