Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.misc
|
|||
|
|||
I'm stumped, splitting up a clump of text into useable information
Here's the problem:
The products we sell have a product name, each product has one or more grades of the product (which usually relate to the quality of it) for each grade we have a package size and a unique number called a "SKU" that identifies that grade and package size in our system. I need to take the source code of a page with all of this information and turn it into an excel sheet with columns titled "product" "grade" "packaging" "sku" for each of the grades. The format of the html source code throws in all sorts of characters that I need to get rid of. It also lumps it into one block text with no line breaks. The below text file shows the format of two products, the first product has 2 grades and the second product has 3 grades. How can I use macros, formulas, anything to get the block of text into the format I want. It would take me years if I go in by hand and remove the unwanted characters and manually copy and paste them into cells. Please help I would appreciate it so so so much. Is this even possible (or feasible). There is no way I can get any other form of the data. Here is an example of what I have: http://www.savefile.com/files/7119995 Example of what I need: http://www.savefile.com/files/9420878 Happy Holidays Everyone, and a huge thank you for all of the people who help people on this site! |
#2
Posted to microsoft.public.excel.misc
|
|||
|
|||
I'm stumped, splitting up a clump of text into useable information
ASAP Utilities has a feature that will remove any selected character(s) from
text......it's free from www.asap-utilities.com Vaya con Dios, Chuck, CABGx3 "Dorn" wrote in message ... Here's the problem: The products we sell have a product name, each product has one or more grades of the product (which usually relate to the quality of it) for each grade we have a package size and a unique number called a "SKU" that identifies that grade and package size in our system. I need to take the source code of a page with all of this information and turn it into an excel sheet with columns titled "product" "grade" "packaging" "sku" for each of the grades. The format of the html source code throws in all sorts of characters that I need to get rid of. It also lumps it into one block text with no line breaks. The below text file shows the format of two products, the first product has 2 grades and the second product has 3 grades. How can I use macros, formulas, anything to get the block of text into the format I want. It would take me years if I go in by hand and remove the unwanted characters and manually copy and paste them into cells. Please help I would appreciate it so so so much. Is this even possible (or feasible). There is no way I can get any other form of the data. Here is an example of what I have: http://www.savefile.com/files/7119995 Example of what I need: http://www.savefile.com/files/9420878 Happy Holidays Everyone, and a huge thank you for all of the people who help people on this site! |
#3
Posted to microsoft.public.excel.misc
|
|||
|
|||
I'm stumped, splitting up a clump of text into useable information
Well, you might be better taking up Chuck's offer ...
I've looked at your text file in Notepad. With word-wrap on and a few hard-returns, you can see the structure in there quite easily and some codes are fairly obvious - " for ", < for Left Tab etc. You could do successive Find & Replace within Notepad to tidy up the file quite a bit. However, there are other codes - presumably c2, c3 etc relate to column 2, column 3 - so it's really down to how well you can recognise patterns as to how long it will take you. I think the general strategy would be to use Notepad (or Wordpad) to convert this into a format which could ultimately be read by Excel. You may still have a lot of tidying up to do once the data has been put into Excel. Pete |
#4
Posted to microsoft.public.excel.misc
|
|||
|
|||
I'm stumped, splitting up a clump of text into useable information
Where does the "original" information come from? If possible can you include a sample of it, i.e. HTML or whatever. -- wjohnson ------------------------------------------------------------------------ wjohnson's Profile: http://www.excelforum.com/member.php...o&userid=29640 View this thread: http://www.excelforum.com/showthread...hreadid=495673 |
#5
Posted to microsoft.public.excel.misc
|
|||
|
|||
I'm stumped, splitting up a clump of text into useable informa
Unfortuantely I don't know if I am able to release all of our packaging sizes
and grades, however the attached text file shows the unique characters for each type of data. If I could just find a way to split up the text into a new row everytime it sees the word "productmaster" that would cut down on the time it takes me to do this considerably. The text file below shows the exact format of the information without line breaks. The full version just has over 500 products with over 2000 grades. Just incase the text file won't download here is an example for two products (with a total of 5 product grades): ProductMaster<ProductMaster Name="Actafoam(R)"<Product Grade="F2, Powder"<Grade SKU="0115791"<c266.138 lb drum</c2<c30115791</c3<c4 null="1"</c4<c5 null="1"</c5</Grade</Product<Product Grade="R-3"<Grade SKU="0114873"<c2449.7384 lb drum</c2<c30114873</c3<c4 null="1"</c4<c5 null="1"</c5</Grade</Product</ProductMaster<ProductMaster Name="Acudyne"<Product Grade="180, 48 solid"<Grade SKU="0113541"<c2473.99 lb drum</c2<c30113541</c3<c4 null="1"</c4<c5 null="1"</c5</Grade</Product<Product Grade="DHR, 48 solid"<Grade SKU="0113550"<c2473.99 lb drum</c2<c30113550</c3<c4 null="1"</c4<c5 null="1"</c5</Grade</Product<Product Grade="SCP 25% Solid"<Grade SKU="0113548"<c2473.99 lb drum</c2<c30113548</c3<c4 null="1"</c4<c5 null="1"</c5</Grade</Product</ As you can see the letters "productmaster" is an identifier of a new product, "product grade" is an identifier of a new grade and so on. If I could at least break it up a little bit into cells I could go through with the asap utility and clean it up. I don't expect this to be quick, just quicker. Thanks! "wjohnson" wrote: Where does the "original" information come from? If possible can you include a sample of it, i.e. HTML or whatever. -- wjohnson ------------------------------------------------------------------------ wjohnson's Profile: http://www.excelforum.com/member.php...o&userid=29640 View this thread: http://www.excelforum.com/showthread...hreadid=495673 |
#6
Posted to microsoft.public.excel.misc
|
|||
|
|||
I'm stumped, splitting up a clump of text into useable informa
Dorn wrote:
Unfortuantely I don't know if I am able to release all of our packaging sizes and grades, however the attached text file shows the unique characters for each type of data. If I could just find a way to split up the text into a new row everytime it sees the word "productmaster" that would cut down on the time it takes me to do this considerably. The text file below shows the exact format of the information without line breaks. The full version just has over 500 products with over 2000 grades. Just incase the text file won't download here is an example for two products (with a total of 5 product grades): ProductMaster<ProductMaster Name="Actafoam(R)"<Product Grade="F2, Powder"<Grade SKU="0115791"<c266.138 lb drum</c2<c30115791</c3<c4 null="1"</c4<c5 null="1"</c5</Grade</Product<Product Grade="R-3"<Grade SKU="0114873"<c2449.7384 lb drum</c2<c30114873</c3<c4 null="1"</c4<c5 null="1"</c5</Grade</Product</ProductMaster<ProductMaster Name="Acudyne"<Product Grade="180, 48 solid"<Grade SKU="0113541"<c2473.99 lb drum</c2<c30113541</c3<c4 null="1"</c4<c5 null="1"</c5</Grade</Product<Product Grade="DHR, 48 solid"<Grade SKU="0113550"<c2473.99 lb drum</c2<c30113550</c3<c4 null="1"</c4<c5 null="1"</c5</Grade</Product<Product Grade="SCP 25% Solid"<Grade SKU="0113548"<c2473.99 lb drum</c2<c30113548</c3<c4 null="1"</c4<c5 null="1"</c5</Grade</Product</ As you can see the letters "productmaster" is an identifier of a new product, "product grade" is an identifier of a new grade and so on. If I could at least break it up a little bit into cells I could go through with the asap utility and clean it up. I don't expect this to be quick, just quicker. Thanks! "wjohnson" wrote: Where does the "original" information come from? If possible can you include a sample of it, i.e. HTML or whatever. -- wjohnson ------------------------------------------------------------------------ wjohnson's Profile: http://www.excelforum.com/member.php...o&userid=29640 View this thread: http://www.excelforum.com/showthread...hreadid=495673 Dorn 1. What software generated the file? Do you have control of that software such that you could see if it has other export formats that might result in 1 row per item? 2. You may want to investigate opening the file in WORD and using Find | Replace to replace "<productmaster" with "Line Break + <productmaster" Texas Handly |
#7
Posted to microsoft.public.excel.misc
|
|||
|
|||
I'm stumped, splitting up a clump of text into useable informa
This file is slightly different (easier) than that pointed to in the
link in your original post - in that you had combinations like "|" and ;>< which made it more difficult to see the pattern. Following on from Brian's suggestion, you want to Find <ProductMaster Name= Replace with Line Break as you don't need the text. Similarly, Find <Product Grade= Replace with , (comma), and Find <Grade SKU= Replace with , This way you can build up a csv file quite quickly and then bring it into Excel. Pete |
#8
Posted to microsoft.public.excel.misc
|
|||
|
|||
I'm stumped, splitting up a clump of text into useable informa
Thanks for all of your help and suggestions. I ended up figuring out how to
do it with a non excel solution. I used edit replace in notepad to change all of the jibberish into html table tags then I saved it as a .html opened it with internet explorer and copied and pasted the table into excel "Pete" wrote: This file is slightly different (easier) than that pointed to in the link in your original post - in that you had combinations like "|" and ;< which made it more difficult to see the pattern. Following on from Brian's suggestion, you want to Find <ProductMaster Name= Replace with Line Break as you don't need the text. Similarly, Find <Product Grade= Replace with , (comma), and Find <Grade SKU= Replace with , This way you can build up a csv file quite quickly and then bring it into Excel. Pete |
#9
Posted to microsoft.public.excel.misc
|
|||
|
|||
I'm stumped, splitting up a clump of text into useable information
Hello,
Try my web table import utility at: http://www.geocities.com/excelmarksway "Dorn" wrote: Here's the problem: The products we sell have a product name, each product has one or more grades of the product (which usually relate to the quality of it) for each grade we have a package size and a unique number called a "SKU" that identifies that grade and package size in our system. I need to take the source code of a page with all of this information and turn it into an excel sheet with columns titled "product" "grade" "packaging" "sku" for each of the grades. The format of the html source code throws in all sorts of characters that I need to get rid of. It also lumps it into one block text with no line breaks. The below text file shows the format of two products, the first product has 2 grades and the second product has 3 grades. How can I use macros, formulas, anything to get the block of text into the format I want. It would take me years if I go in by hand and remove the unwanted characters and manually copy and paste them into cells. Please help I would appreciate it so so so much. Is this even possible (or feasible). There is no way I can get any other form of the data. Here is an example of what I have: http://www.savefile.com/files/7119995 Example of what I need: http://www.savefile.com/files/9420878 Happy Holidays Everyone, and a huge thank you for all of the people who help people on this site! |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
How do I graph text information in Excel | Charts and Charting in Excel | |||
merging text information | Excel Discussion (Misc queries) | |||
how do i make a chart using text information? | Excel Worksheet Functions | |||
Read Text File into Excel Using VBA | Excel Discussion (Misc queries) | |||
Splitting text in one column into two (or more) columns. | Excel Worksheet Functions |