Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 11
Default Extracting specific data from inconsistant multiple word documents

Hi Guys,

I have about a 1000 word documents and need to extract all the email
addresses from them. Ideally the result would place the filename in column A
and the email address in column B. None of the documents are consistant in
format. I have got as far as listing the file names.

Many thanks.
  #2   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 5,651
Default Extracting specific data from inconsistant multiple word documents

On Mon, 22 Dec 2008 01:47:01 -0800, Matt Bennette
wrote:

Hi Guys,

I have about a 1000 word documents and need to extract all the email
addresses from them. Ideally the result would place the filename in column A
and the email address in column B. None of the documents are consistant in
format. I have got as far as listing the file names.

Many thanks.


There are ways of recognizing email addresses within text strings. For
example, depending on your data, you could look for the "@" character. Or
perhaps you could look for a string in which the "@" is followed by one or more
characters; then a ".", then a few more characters.

--ron
  #3   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 11
Default Extracting specific data from inconsistant multiple word docum

Hi Ron

I am pretty new to VBA, but was hoping that I could engineer a script that
would produce a spreadsheet with the file name in column A and the Email
address in Column B

This is what I am currently using to get the filenames

Function GetFileList(FileSpec As String) As Variant
' Returns an array of filenames that match FileSpec
' If no matching files are found, it returns False

Dim FileArray() As Variant
Dim FileCount As Integer
Dim FileName As String

On Error GoTo NoFilesFound

FileCount = 0
FileName = Dir(FileSpec)
If FileName = "" Then GoTo NoFilesFound

' Loop until no more matching files are found
Do While FileName < ""
FileCount = FileCount + 1
ReDim Preserve FileArray(1 To FileCount)
FileArray(FileCount) = FileName
FileName = Dir()
Loop
GetFileList = FileArray
Exit Function

' Error handler
NoFilesFound:
GetFileList = False
End Function


Sub test()
Dim p As String, x As Variant

p = "C:\Documents and Settings\Administrator\Desktop\Chefs\*.*"
x = GetFileList(p)

Select Case IsArray(x)
Case True 'files found
MsgBox UBound(x)
Sheets("Sheet1").Range("A:A").Clear
For i = LBound(x) To UBound(x)
Sheets("Sheet1").Cells(i, 1).Value = x(i)
Next i
Case False 'no files found
MsgBox "No matching files"
End Select
End Sub

Which I confess was donated by one of the fab subscribers here

The strings would be of differing lengths, so how would wild card work is
there a specific syntax i.e *?*@*?*.*?*. And what about VBA referencing Word
Commands to perform this

Sorry I'm a complete novice and would really appreciate some help on this

Many thanks

"Ron Rosenfeld" wrote:

On Mon, 22 Dec 2008 01:47:01 -0800, Matt Bennette
wrote:

Hi Guys,

I have about a 1000 word documents and need to extract all the email
addresses from them. Ideally the result would place the filename in column A
and the email address in column B. None of the documents are consistant in
format. I have got as far as listing the file names.

Many thanks.


There are ways of recognizing email addresses within text strings. For
example, depending on your data, you could look for the "@" character. Or
perhaps you could look for a string in which the "@" is followed by one or more
characters; then a ".", then a few more characters.

--ron

  #4   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 5,651
Default Extracting specific data from inconsistant multiple word docum

On Mon, 22 Dec 2008 08:15:01 -0800, Matt Bennette
wrote:

The strings would be of differing lengths, so how would wild card work is
there a specific syntax i.e *?*@*?*.*?*. And what about VBA referencing Word
Commands to perform this


I don't know about "VBA referencing Word Commands"

The following uses Regular Expressions:

Is there only one email address per file? Or could there be multiple email
addresses in a file.

Here is a routine which, when applied against a text string of unspecified
length, will return the first string that looks like an email address. The
pattern does NOT contain all the rules for validating an email address, but
perhaps this method will get you started. Post back with more questions as
needed.

=====================================
Option Explicit
Public Const str As String = "now is the time for "
'--------------------------------------

Sub ExtrEmail()
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Pattern = "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b"
If re.test(str) = True Then
Set mc = re.Execute(str)
Debug.Print mc(0).Value
End If
End Sub
==================================
--ron
  #5   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 11
Default Extracting specific data from inconsistant multiple word docum

In general, there is only one email address, however I dont understand how I
can apply this code to multiple word documents.

Ideally I need the result to list the file name and the extracted email
address.

"Ron Rosenfeld" wrote:

On Mon, 22 Dec 2008 08:15:01 -0800, Matt Bennette
wrote:

The strings would be of differing lengths, so how would wild card work is
there a specific syntax i.e *?*@*?*.*?*. And what about VBA referencing Word
Commands to perform this


I don't know about "VBA referencing Word Commands"

The following uses Regular Expressions:

Is there only one email address per file? Or could there be multiple email
addresses in a file.

Here is a routine which, when applied against a text string of unspecified
length, will return the first string that looks like an email address. The
pattern does NOT contain all the rules for validating an email address, but
perhaps this method will get you started. Post back with more questions as
needed.

=====================================
Option Explicit
Public Const str As String = "now is the time for "
'--------------------------------------

Sub ExtrEmail()
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Pattern = "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b"
If re.test(str) = True Then
Set mc = re.Execute(str)
Debug.Print mc(0).Value
End If
End Sub
==================================
--ron

Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting Specific Data from one sheet to another. Pulling Data from one sheet to another. Excel Worksheet Functions 1 July 14th 08 10:42 PM
multiple worksheets in workbook mail merged into a word documents la Excel Worksheet Functions 1 August 8th 07 05:52 PM
Extracting specific word in a cell Conditional Formatting Excel Worksheet Functions 2 February 14th 06 09:57 PM
Extracting specific data from a cell Neil Bowen Excel Worksheet Functions 5 October 30th 05 04:33 PM
Pivot Table - Extracting specific data JT Excel Worksheet Functions 1 June 3rd 05 06:05 PM


All times are GMT +1. The time now is 07:22 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"