ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Data Parsing Problem (https://www.excelbanter.com/excel-programming/348857-data-parsing-problem.html)

Stratuser

Data Parsing Problem
 
I need to parse text data copied onto an Excel worksheet from a big table in
a PDF file. Each line of text contains a stock's ticker symbol, CUSIP
number, company name, number of shares, and other data, in that order. The
problem is that the company name can be anything from one word to five words
separated by a space, and that messes up the parsing into columns. Using the
Data/Text-to-Columns won't work because company name gets split up into
multiple columns so that the number of shares falls in the wrong column. To
illustrate, here's a typical situation, with three lines of text:

XOM 30231G102 EXXON MOBIL CORP 622,900.00 63.5400 39,579,066.00 4.17%
PFE 717081103 PFIZER INC 1,475,900.00 24.9700 36,853,223.00 3.88%
AEP 025537101 AMERICAN ELEC PWR INC 377,300.00 39.7000 14,978,810.00 1.58%

Using VBA code, how can I parse these rows into columns correctly? Thanks
for your help.



Tom Ogilvy

Data Parsing Problem
 
This worked with your sample date:

Option Explicit
Sub fixColumns()
Dim rng As Range
Dim rng1 As Range
Dim rng2 As Range
Dim cell As Range, sStr As String
Range("A1").CurrentRegion.Columns(1).TextToColumns _
Destination:=Range("A1"), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, _
ConsecutiveDelimiter:=True, Tab:=True, _
Semicolon:=False, Comma:=False, Space:=True, _
Other:=False, FieldInfo:=Array( _
Array(1, 1), Array(2, 1), Array(3, 1), _
Array(4, 1), Array(5, 1), Array(6, 1), _
Array(7, 1), Array(8, 1), Array(9, 1), _
Array(10, 1)), TrailingMinusNumbers:=True
Rows(1).Insert
Range("A1:H1").Value = Array("H1", "H2", _
"H3", "H4", "H5", "H6", "H7", "H8")
ActiveSheet.AutoFilterMode = False
Set rng = Intersect(ActiveSheet.Range("A1") _
.CurrentRegion.EntireRow, Columns(8))
rng.AutoFilter Field:=1, Criteria1:="<"
Set rng2 = rng.Offset(1, -5).Resize(rng.Rows.Count - 1, 1)
On Error Resume Next
Set rng1 = rng2.SpecialCells(xlVisible)
On Error GoTo 0
Do While Not rng1 Is Nothing
For Each cell In rng1
sStr = cell.Value & " " & cell.Offset(0, 1).Value
cell.Value = sStr
cell.Offset(0, 1).Delete Shift:=xlShiftToLeft
Next
On Error Resume Next
Set rng1 = Nothing
ActiveSheet.AutoFilterMode = False
rng.AutoFilter Field:=1, Criteria1:="<"
Set rng1 = rng2.SpecialCells(xlVisible)
On Error GoTo 0
Debug.Print rng.Address, rng2.Address
Loop
Rows(1).Delete
MsgBox "No data in column 8 - done"

End Sub

--
Regards,
Tom Ogilvy

"Stratuser" wrote in message
...
I need to parse text data copied onto an Excel worksheet from a big table

in
a PDF file. Each line of text contains a stock's ticker symbol, CUSIP
number, company name, number of shares, and other data, in that order.

The
problem is that the company name can be anything from one word to five

words
separated by a space, and that messes up the parsing into columns. Using

the
Data/Text-to-Columns won't work because company name gets split up into
multiple columns so that the number of shares falls in the wrong column.

To
illustrate, here's a typical situation, with three lines of text:

XOM 30231G102 EXXON MOBIL CORP 622,900.00 63.5400 39,579,066.00 4.17%
PFE 717081103 PFIZER INC 1,475,900.00 24.9700 36,853,223.00 3.88%
AEP 025537101 AMERICAN ELEC PWR INC 377,300.00 39.7000 14,978,810.00 1.58%

Using VBA code, how can I parse these rows into columns correctly? Thanks
for your help.





Ron Rosenfeld

Data Parsing Problem
 
On Fri, 23 Dec 2005 08:43:02 -0800, Stratuser
wrote:

I need to parse text data copied onto an Excel worksheet from a big table in
a PDF file. Each line of text contains a stock's ticker symbol, CUSIP
number, company name, number of shares, and other data, in that order. The
problem is that the company name can be anything from one word to five words
separated by a space, and that messes up the parsing into columns. Using the
Data/Text-to-Columns won't work because company name gets split up into
multiple columns so that the number of shares falls in the wrong column. To
illustrate, here's a typical situation, with three lines of text:

XOM 30231G102 EXXON MOBIL CORP 622,900.00 63.5400 39,579,066.00 4.17%
PFE 717081103 PFIZER INC 1,475,900.00 24.9700 36,853,223.00 3.88%
AEP 025537101 AMERICAN ELEC PWR INC 377,300.00 39.7000 14,978,810.00 1.58%

Using VBA code, how can I parse these rows into columns correctly? Thanks
for your help.


In VBA code, you can do the following. I assumed you would define the range to
be parsed by selecting the cells in the column, but you could use other
methods. I also assumed there were always four pieces of data following the
stock name. (Shares, Price, Market Cap, Yield)

===============================================
Option Explicit
Sub StockData()
Dim c As Range
Dim ParsedData As Variant
Dim i As Long, j As Long

For Each c In Selection
ParsedData = Split(c.Text, " ")
i = 3
Do Until i = UBound(ParsedData) - 3
ParsedData(2) = ParsedData(2) & " " & ParsedData(i)
i = i + 1
Loop

For j = 3 To 6
ParsedData(j) = ParsedData(i + j - 3)
Next j

ReDim Preserve ParsedData(6)

For i = 0 To 6
c.Offset(0, i).Value = ParsedData(i)
Next i
Next c
End Sub
=====================================

This task can also be done with regular expressions in worksheet formulas.

For this method, first download and install Longre's free morefunc.xll add-in
from http://xcell05.free.fr/

Then you can use the following formulas:

Ticker: =REGEX.MID(A2,"\w+")
Cusip: =REGEX.MID(A2,"\w+",2)

Name:
=REGEX.MID(REGEX.SUBSTITUTE(A2,"^\w+\s+\w+\s+"),"( \w+\s+)+(?=.*\s.*\s.*\s.*%)")

Shares: =REGEX.MID($A2,"(\d+(,|\.|))+\s",-3)
Price: =REGEX.MID($A2,"(\d+(,|\.|))+\s",-2)
MarketCap: =REGEX.MID($A2,"(\d+(,|\.|))+\s",-1)
Yield: =REGEX.MID($A2,"(\d+(,|\.|))+%",-1)


--ron


All times are GMT +1. The time now is 03:31 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
ExcelBanter.com