View Single Post
  #3   Report Post  
Posted to microsoft.public.excel.misc
Ron Rosenfeld Ron Rosenfeld is offline
external usenet poster
 
Posts: 5,651
Default Compiling Email Addresses from Text

On Sat, 2 Feb 2008 10:32:00 -0800, Ralph
wrote:

Hi All,

I frequently need to extract email addresses from huge amounts of text, like
40+ pages long, etc... for large web design clients of mine...

I was wondering if a macro could be developed somehow to leave me with a
stack of email addresses. I suppose the macro would have to test each piece
of text for an "@" and a ".com" and then stack only those terms in a column
somewhere. Any ideas on how to do this? I am not macro savvy AT ALL...

thanks!


How is the text organized?

Does your source data present itself in an Excel workbook?

It can be difficult to decide on whether a string is a valid email address,
without actually using it to see if it "goes through". But you can certainly
use a "regular expression" that would detect most of them. It would certainly
make it easier if they all ended in ".com", but there are many top level
domains that are valid.

Here's an example of a routine that steps through a range of cells, and
extracts everything that looks like a valid email address with specific
top-level domains.

In this case it "prints" the results to the immediate window, but obviously
both the source and destinations can be modified fairly readily, once we have
more knowledge of what we are dealing with.

For example, the range to search could be changed to a file; and the
destination to sequential cells in a column.

====================================
Option Explicit
Sub ExtractEmails()
Dim c As Range
Dim str As String
Dim re As Object, mc As Object, m As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
'This regex probably complies with RFC 2822 and matches all
'country code top level domains, and specific common top level domains.
re.Pattern = "[a-z0-9!#$%&'*+/=?^_`{|}~-]+" & _
"(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@" & _
"(?:[a-z0-9](?:[a-z0-9\-]*[a-z0-9])?\.)+" & _
"(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|name" & _
"|aero|biz|info|mobi|jobs|museum)\b"
For Each c In Range("A1:A100")
str = c.Value
Set mc = re.Execute(str)
For Each m In mc
Debug.Print m.Value
Next m
Next c
End Sub
=======================================
--ron