Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.misc
Dorn
 
Posts: n/a
Default I'm stumped, splitting up a clump of text into useable information

Here's the problem:

The products we sell have a product name, each product has one or more
grades of the product (which usually relate to the quality of it) for each
grade we have a package size and a unique number called a "SKU" that
identifies that grade and package size in our system.

I need to take the source code of a page with all of this information and
turn it into an excel sheet with columns titled "product" "grade" "packaging"
"sku" for each of the grades. The format of the html source code throws in
all sorts of characters that I need to get rid of. It also lumps it into one
block text with no line breaks. The below text file shows the format of two
products, the first product has 2 grades and the second product has 3 grades.
How can I use macros, formulas, anything to get the block of text into the
format I want. It would take me years if I go in by hand and remove the
unwanted characters and manually copy and paste them into cells. Please help
I would appreciate it so so so much. Is this even possible (or feasible).
There is no way I can get any other form of the data.
Here is an example of what I have:

http://www.savefile.com/files/7119995

Example of what I need:

http://www.savefile.com/files/9420878

Happy Holidays Everyone, and a huge thank you for all of the people who help
people on this site!
  #2   Report Post  
Posted to microsoft.public.excel.misc
CLR
 
Posts: n/a
Default I'm stumped, splitting up a clump of text into useable information

ASAP Utilities has a feature that will remove any selected character(s) from
text......it's free from www.asap-utilities.com

Vaya con Dios,
Chuck, CABGx3


"Dorn" wrote in message
...
Here's the problem:

The products we sell have a product name, each product has one or more
grades of the product (which usually relate to the quality of it) for each
grade we have a package size and a unique number called a "SKU" that
identifies that grade and package size in our system.

I need to take the source code of a page with all of this information and
turn it into an excel sheet with columns titled "product" "grade"

"packaging"
"sku" for each of the grades. The format of the html source code throws

in
all sorts of characters that I need to get rid of. It also lumps it into

one
block text with no line breaks. The below text file shows the format of

two
products, the first product has 2 grades and the second product has 3

grades.
How can I use macros, formulas, anything to get the block of text into

the
format I want. It would take me years if I go in by hand and remove the
unwanted characters and manually copy and paste them into cells. Please

help
I would appreciate it so so so much. Is this even possible (or feasible).
There is no way I can get any other form of the data.
Here is an example of what I have:

http://www.savefile.com/files/7119995

Example of what I need:

http://www.savefile.com/files/9420878

Happy Holidays Everyone, and a huge thank you for all of the people who

help
people on this site!



  #3   Report Post  
Posted to microsoft.public.excel.misc
Pete
 
Posts: n/a
Default I'm stumped, splitting up a clump of text into useable information

Well, you might be better taking up Chuck's offer ...

I've looked at your text file in Notepad. With word-wrap on and a few
hard-returns, you can see the structure in there quite easily and some
codes are fairly obvious - " for ", < for Left Tab etc. You
could do successive Find & Replace within Notepad to tidy up the file
quite a bit. However, there are other codes - presumably c2, c3 etc
relate to column 2, column 3 - so it's really down to how well you can
recognise patterns as to how long it will take you.

I think the general strategy would be to use Notepad (or Wordpad) to
convert this into a format which could ultimately be read by Excel. You
may still have a lot of tidying up to do once the data has been put
into Excel.

Pete

  #4   Report Post  
Posted to microsoft.public.excel.misc
wjohnson
 
Posts: n/a
Default I'm stumped, splitting up a clump of text into useable information


Where does the "original" information come from? If possible can you
include a sample of it, i.e. HTML or whatever.


--
wjohnson
------------------------------------------------------------------------
wjohnson's Profile: http://www.excelforum.com/member.php...o&userid=29640
View this thread: http://www.excelforum.com/showthread...hreadid=495673

  #5   Report Post  
Posted to microsoft.public.excel.misc
Dorn
 
Posts: n/a
Default I'm stumped, splitting up a clump of text into useable informa

Unfortuantely I don't know if I am able to release all of our packaging sizes
and grades, however the attached text file shows the unique characters for
each type of data. If I could just find a way to split up the text into a
new row everytime it sees the word "productmaster" that would cut down on the
time it takes me to do this considerably. The text file below shows the
exact format of the information without line breaks. The full version just
has over 500 products with over 2000 grades. Just incase the text file won't
download here is an example for two products (with a total of 5 product
grades):

ProductMaster<ProductMaster Name="Actafoam(R)"<Product Grade="F2,
Powder"<Grade SKU="0115791"<c266.138 lb drum</c2<c30115791</c3<c4
null="1"</c4<c5 null="1"</c5</Grade</Product<Product Grade="R-3"<Grade
SKU="0114873"<c2449.7384 lb drum</c2<c30114873</c3<c4 null="1"</c4<c5
null="1"</c5</Grade</Product</ProductMaster<ProductMaster
Name="Acudyne"<Product Grade="180, 48 solid"<Grade SKU="0113541"<c2473.99
lb drum</c2<c30113541</c3<c4 null="1"</c4<c5
null="1"</c5</Grade</Product<Product Grade="DHR, 48 solid"<Grade
SKU="0113550"<c2473.99 lb drum</c2<c30113550</c3<c4 null="1"</c4<c5
null="1"</c5</Grade</Product<Product Grade="SCP 25% Solid"<Grade
SKU="0113548"<c2473.99 lb drum</c2<c30113548</c3<c4 null="1"</c4<c5
null="1"</c5</Grade</Product</

As you can see the letters "productmaster" is an identifier of a new
product, "product grade" is an identifier of a new grade and so on. If I
could at least break it up a little bit into cells I could go through with
the asap utility and clean it up. I don't expect this to be quick, just
quicker.

Thanks!

"wjohnson" wrote:


Where does the "original" information come from? If possible can you
include a sample of it, i.e. HTML or whatever.


--
wjohnson
------------------------------------------------------------------------
wjohnson's Profile: http://www.excelforum.com/member.php...o&userid=29640
View this thread: http://www.excelforum.com/showthread...hreadid=495673




  #6   Report Post  
Posted to microsoft.public.excel.misc
Brian Handly
 
Posts: n/a
Default I'm stumped, splitting up a clump of text into useable informa

Dorn wrote:
Unfortuantely I don't know if I am able to release all of our packaging sizes
and grades, however the attached text file shows the unique characters for
each type of data. If I could just find a way to split up the text into a
new row everytime it sees the word "productmaster" that would cut down on the
time it takes me to do this considerably. The text file below shows the
exact format of the information without line breaks. The full version just
has over 500 products with over 2000 grades. Just incase the text file won't
download here is an example for two products (with a total of 5 product
grades):

ProductMaster<ProductMaster Name="Actafoam(R)"<Product Grade="F2,
Powder"<Grade SKU="0115791"<c266.138 lb drum</c2<c30115791</c3<c4
null="1"</c4<c5 null="1"</c5</Grade</Product<Product Grade="R-3"<Grade
SKU="0114873"<c2449.7384 lb drum</c2<c30114873</c3<c4 null="1"</c4<c5
null="1"</c5</Grade</Product</ProductMaster<ProductMaster
Name="Acudyne"<Product Grade="180, 48 solid"<Grade SKU="0113541"<c2473.99
lb drum</c2<c30113541</c3<c4 null="1"</c4<c5
null="1"</c5</Grade</Product<Product Grade="DHR, 48 solid"<Grade
SKU="0113550"<c2473.99 lb drum</c2<c30113550</c3<c4 null="1"</c4<c5
null="1"</c5</Grade</Product<Product Grade="SCP 25% Solid"<Grade
SKU="0113548"<c2473.99 lb drum</c2<c30113548</c3<c4 null="1"</c4<c5
null="1"</c5</Grade</Product</

As you can see the letters "productmaster" is an identifier of a new
product, "product grade" is an identifier of a new grade and so on. If I
could at least break it up a little bit into cells I could go through with
the asap utility and clean it up. I don't expect this to be quick, just
quicker.

Thanks!

"wjohnson" wrote:

Where does the "original" information come from? If possible can you
include a sample of it, i.e. HTML or whatever.


--
wjohnson
------------------------------------------------------------------------
wjohnson's Profile: http://www.excelforum.com/member.php...o&userid=29640
View this thread: http://www.excelforum.com/showthread...hreadid=495673


Dorn

1. What software generated the file? Do you have control of that
software such that you could see if it has other export formats that
might result in 1 row per item?
2. You may want to investigate opening the file in WORD and using Find |
Replace to replace "<productmaster" with "Line Break + <productmaster"

Texas Handly
  #7   Report Post  
Posted to microsoft.public.excel.misc
Pete
 
Posts: n/a
Default I'm stumped, splitting up a clump of text into useable informa

This file is slightly different (easier) than that pointed to in the
link in your original post - in that you had combinations like
&quot;|&quot and ;&gt;&lt; which made it more difficult to see the
pattern. Following on from Brian's suggestion, you want to
Find <ProductMaster Name=
Replace with Line Break
as you don't need the text. Similarly,
Find <Product Grade=
Replace with , (comma), and
Find <Grade SKU=
Replace with ,

This way you can build up a csv file quite quickly and then bring it
into Excel.

Pete

  #8   Report Post  
Posted to microsoft.public.excel.misc
Dorn
 
Posts: n/a
Default I'm stumped, splitting up a clump of text into useable informa

Thanks for all of your help and suggestions. I ended up figuring out how to
do it with a non excel solution. I used edit replace in notepad to change
all of the jibberish into html table tags then I saved it as a .html opened
it with internet explorer and copied and pasted the table into excel

"Pete" wrote:

This file is slightly different (easier) than that pointed to in the
link in your original post - in that you had combinations like
"|&quot and ;< which made it more difficult to see the
pattern. Following on from Brian's suggestion, you want to
Find <ProductMaster Name=
Replace with Line Break
as you don't need the text. Similarly,
Find <Product Grade=
Replace with , (comma), and
Find <Grade SKU=
Replace with ,

This way you can build up a csv file quite quickly and then bring it
into Excel.

Pete


  #9   Report Post  
Posted to microsoft.public.excel.misc
exceluserforeman
 
Posts: n/a
Default I'm stumped, splitting up a clump of text into useable information

Hello,
Try my web table import utility at:
http://www.geocities.com/excelmarksway


"Dorn" wrote:

Here's the problem:

The products we sell have a product name, each product has one or more
grades of the product (which usually relate to the quality of it) for each
grade we have a package size and a unique number called a "SKU" that
identifies that grade and package size in our system.

I need to take the source code of a page with all of this information and
turn it into an excel sheet with columns titled "product" "grade" "packaging"
"sku" for each of the grades. The format of the html source code throws in
all sorts of characters that I need to get rid of. It also lumps it into one
block text with no line breaks. The below text file shows the format of two
products, the first product has 2 grades and the second product has 3 grades.
How can I use macros, formulas, anything to get the block of text into the
format I want. It would take me years if I go in by hand and remove the
unwanted characters and manually copy and paste them into cells. Please help
I would appreciate it so so so much. Is this even possible (or feasible).
There is no way I can get any other form of the data.
Here is an example of what I have:

http://www.savefile.com/files/7119995

Example of what I need:

http://www.savefile.com/files/9420878

Happy Holidays Everyone, and a huge thank you for all of the people who help
people on this site!

Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I graph text information in Excel garrison0013 Charts and Charting in Excel 5 April 3rd 23 03:41 PM
merging text information danielle Excel Discussion (Misc queries) 1 June 13th 05 02:20 PM
how do i make a chart using text information? danielle Excel Worksheet Functions 5 February 26th 05 09:21 PM
Read Text File into Excel Using VBA Willie T Excel Discussion (Misc queries) 13 January 8th 05 12:37 AM
Splitting text in one column into two (or more) columns. RickyDee Excel Worksheet Functions 4 December 7th 04 10:03 PM


All times are GMT +1. The time now is 09:19 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"