View Single Post
  #3   Report Post  
Posted to microsoft.public.excel.programming
Ron Rosenfeld Ron Rosenfeld is offline
external usenet poster
 
Posts: 5,651
Default Data Parsing Problem

On Fri, 23 Dec 2005 08:43:02 -0800, Stratuser
wrote:

I need to parse text data copied onto an Excel worksheet from a big table in
a PDF file. Each line of text contains a stock's ticker symbol, CUSIP
number, company name, number of shares, and other data, in that order. The
problem is that the company name can be anything from one word to five words
separated by a space, and that messes up the parsing into columns. Using the
Data/Text-to-Columns won't work because company name gets split up into
multiple columns so that the number of shares falls in the wrong column. To
illustrate, here's a typical situation, with three lines of text:

XOM 30231G102 EXXON MOBIL CORP 622,900.00 63.5400 39,579,066.00 4.17%
PFE 717081103 PFIZER INC 1,475,900.00 24.9700 36,853,223.00 3.88%
AEP 025537101 AMERICAN ELEC PWR INC 377,300.00 39.7000 14,978,810.00 1.58%

Using VBA code, how can I parse these rows into columns correctly? Thanks
for your help.


In VBA code, you can do the following. I assumed you would define the range to
be parsed by selecting the cells in the column, but you could use other
methods. I also assumed there were always four pieces of data following the
stock name. (Shares, Price, Market Cap, Yield)

===============================================
Option Explicit
Sub StockData()
Dim c As Range
Dim ParsedData As Variant
Dim i As Long, j As Long

For Each c In Selection
ParsedData = Split(c.Text, " ")
i = 3
Do Until i = UBound(ParsedData) - 3
ParsedData(2) = ParsedData(2) & " " & ParsedData(i)
i = i + 1
Loop

For j = 3 To 6
ParsedData(j) = ParsedData(i + j - 3)
Next j

ReDim Preserve ParsedData(6)

For i = 0 To 6
c.Offset(0, i).Value = ParsedData(i)
Next i
Next c
End Sub
=====================================

This task can also be done with regular expressions in worksheet formulas.

For this method, first download and install Longre's free morefunc.xll add-in
from http://xcell05.free.fr/

Then you can use the following formulas:

Ticker: =REGEX.MID(A2,"\w+")
Cusip: =REGEX.MID(A2,"\w+",2)

Name:
=REGEX.MID(REGEX.SUBSTITUTE(A2,"^\w+\s+\w+\s+"),"( \w+\s+)+(?=.*\s.*\s.*\s.*%)")

Shares: =REGEX.MID($A2,"(\d+(,|\.|))+\s",-3)
Price: =REGEX.MID($A2,"(\d+(,|\.|))+\s",-2)
MarketCap: =REGEX.MID($A2,"(\d+(,|\.|))+\s",-1)
Yield: =REGEX.MID($A2,"(\d+(,|\.|))+%",-1)


--ron