Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 6
Default please help.. highly important...

i have a excel .CSV file that Gmail generated for me with my contacts in..
what i want to do is function that go over all the contacts emails one by
one and deletes all missformated addresses that cannot be use to send email
for them.. somthing like :

if (the cell's text is not an valid email address)
{
delete the cell ;
}

to be honest i dont know much about excel so i need somthing from
scratch like how to put the function in the file and that..

thank you all very much..

  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1,549
Default please help.. highly important...

Also posted in public.excel.misc



"yakir"
wrote in message ...
i have a excel .CSV file that Gmail generated for me with my contacts in..
what i want to do is function that go over all the contacts emails one by
one and deletes all missformated addresses that cannot be use to send email
for them.. somthing like :

if (the cell's text is not an valid email address)
{
delete the cell ;
}

to be honest i dont know much about excel so i need somthing from
scratch like how to put the function in the file and that..

thank you all very much..

  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 7,247
Default please help.. highly important...

In VBA insert a new module. Then go to the Tools menu, choose
References, and scroll down to and check "Microsoft VBScript Regular
Expressions 5.5". Then, use code like the following

Sub AAA()
Dim LastRow As Long
Dim RowNdx As Long
Dim TopRow As Long
Dim WS As Worksheet
Dim Pattern As String
Dim RegEx As RegExp
Const ADDR_COL = "A" '<<<< CHANGE
TopRow = 1 '<<<< CHANGE
Pattern = "(^[a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}" & _
"\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})$"
Set RegEx = New RegExp
RegEx.Pattern = Pattern
Set WS = ActiveSheet
With WS
LastRow = .Cells(.Rows.Count, ADDR_COL).End(xlUp).Row
For RowNdx = LastRow To TopRow Step -1
If RegEx.Test(.Cells(RowNdx, ADDR_COL).Text) = False Then
.Rows(RowNdx).Delete
End If
Next RowNdx
End With
End Sub


Change the lines marked with <<<< to the correct values. ADDR_COL
should be the column letter of the column containing the email
addresses to test. TopRow should be the row number of the top of the
range of values to test. The code begins testing the addresses on the
row calculated as LastRow and moves upwards until it reaches TopRow,
at which point it quits. For each row between LastRow and TopRow
(inclusive), if an invalid email address is found in ADDR_COL, that
row is deleted.

Cordially,
Chip Pearson
Microsoft Most Valuable Professional,
Excel, 1998 - 2010
Pearson Software Consulting, LLC
www.cpearson.com





On Mon, 24 May 2010 03:05:01 -0700, yakir
wrote:

i have a excel .CSV file that Gmail generated for me with my contacts in..
what i want to do is function that go over all the contacts emails one by
one and deletes all missformated addresses that cannot be use to send email
for them.. somthing like :

if (the cell's text is not an valid email address)
{
delete the cell ;
}

to be honest i dont know much about excel so i need somthing from
scratch like how to put the function in the file and that..

thank you all very much..

  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default please help.. highly important...

On Mon, 24 May 2010 07:21:17 -0500, Chip Pearson wrote:

In VBA insert a new module. Then go to the Tools menu, choose
References, and scroll down to and check "Microsoft VBScript Regular
Expressions 5.5". Then, use code like the following

Sub AAA()
Dim LastRow As Long
Dim RowNdx As Long
Dim TopRow As Long
Dim WS As Worksheet
Dim Pattern As String
Dim RegEx As RegExp
Const ADDR_COL = "A" '<<<< CHANGE
TopRow = 1 '<<<< CHANGE
Pattern = "(^[a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}" & _
"\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})$"
Set RegEx = New RegExp
RegEx.Pattern = Pattern
Set WS = ActiveSheet
With WS
LastRow = .Cells(.Rows.Count, ADDR_COL).End(xlUp).Row
For RowNdx = LastRow To TopRow Step -1
If RegEx.Test(.Cells(RowNdx, ADDR_COL).Text) = False Then
.Rows(RowNdx).Delete
End If
Next RowNdx
End With
End Sub


Change the lines marked with <<<< to the correct values. ADDR_COL
should be the column letter of the column containing the email
addresses to test. TopRow should be the row number of the top of the
range of values to test. The code begins testing the addresses on the
row calculated as LastRow and moves upwards until it reaches TopRow,
at which point it quits. For each row between LastRow and TopRow
(inclusive), if an invalid email address is found in ADDR_COL, that
row is deleted.

Cordially,
Chip Pearson
Microsoft Most Valuable Professional,
Excel, 1998 - 2010
Pearson Software Consulting, LLC
www.cpearson.com





It's tough to devise a regex that can comply with the relevant internet
standards. I think the current one is RFC5322

Your regex excludes some email addresses which I believe are valid, and
includes some which are invalid.

For example, your regex excludes any TLD's that have more than four characters.
But the current generic domain list includes .museum & .travel, and there will
likely be more, longer lettered TLD's in the future.

It also excludes some characters that are valid in email addresses, at least in
the local part:

! # $ % & ' * + - / = ? ^ _ ` { | } ~

It allows consecutive dots in the local part of the address, as well as
allowing them at the start or end of the local part of the address -- and that
is not valid.

Since the OP is looking for malformed email addresses, I'd be concerned that
your regex would eliminate some valid one's, and include some invalid one's.

It can be pretty tough to devise something that does not have false positives
or negatives. If IP addresses don't need to be included, I'd suggest something
(admittedly incomplete) like (with the case insensitive option (ignorecase)
equal to true):


"^[-\w.%+']+@[A-Z0-9.-]+\.(?:aero|asia|biz|cat|com|coop|edu|gov|info|int| " _
& "jobs|mil|mobi|museum|name|net|org|pro|tel|tra vel|[A-Z]{2})$"

which will allow any two character domain name (to allow for the country codes)
as well as the currently used specific list of generic top-level domains.



--ron
  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 6
Default please help.. highly important...

Thanks man..

"Ron Rosenfeld" wrote:

On Mon, 24 May 2010 07:21:17 -0500, Chip Pearson wrote:

In VBA insert a new module. Then go to the Tools menu, choose
References, and scroll down to and check "Microsoft VBScript Regular
Expressions 5.5". Then, use code like the following

Sub AAA()
Dim LastRow As Long
Dim RowNdx As Long
Dim TopRow As Long
Dim WS As Worksheet
Dim Pattern As String
Dim RegEx As RegExp
Const ADDR_COL = "A" '<<<< CHANGE
TopRow = 1 '<<<< CHANGE
Pattern = "(^[a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}" & _
"\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})$"
Set RegEx = New RegExp
RegEx.Pattern = Pattern
Set WS = ActiveSheet
With WS
LastRow = .Cells(.Rows.Count, ADDR_COL).End(xlUp).Row
For RowNdx = LastRow To TopRow Step -1
If RegEx.Test(.Cells(RowNdx, ADDR_COL).Text) = False Then
.Rows(RowNdx).Delete
End If
Next RowNdx
End With
End Sub


Change the lines marked with <<<< to the correct values. ADDR_COL
should be the column letter of the column containing the email
addresses to test. TopRow should be the row number of the top of the
range of values to test. The code begins testing the addresses on the
row calculated as LastRow and moves upwards until it reaches TopRow,
at which point it quits. For each row between LastRow and TopRow
(inclusive), if an invalid email address is found in ADDR_COL, that
row is deleted.

Cordially,
Chip Pearson
Microsoft Most Valuable Professional,
Excel, 1998 - 2010
Pearson Software Consulting, LLC
www.cpearson.com





It's tough to devise a regex that can comply with the relevant internet
standards. I think the current one is RFC5322

Your regex excludes some email addresses which I believe are valid, and
includes some which are invalid.

For example, your regex excludes any TLD's that have more than four characters.
But the current generic domain list includes .museum & .travel, and there will
likely be more, longer lettered TLD's in the future.

It also excludes some characters that are valid in email addresses, at least in
the local part:

! # $ % & ' * + - / = ? ^ _ ` { | } ~

It allows consecutive dots in the local part of the address, as well as
allowing them at the start or end of the local part of the address -- and that
is not valid.

Since the OP is looking for malformed email addresses, I'd be concerned that
your regex would eliminate some valid one's, and include some invalid one's.

It can be pretty tough to devise something that does not have false positives
or negatives. If IP addresses don't need to be included, I'd suggest something
(admittedly incomplete) like (with the case insensitive option (ignorecase)
equal to true):


"^[-\w.%+']+@[A-Z0-9.-]+\.(?:aero|asia|biz|cat|com|coop|edu|gov|info|int| " _
& "jobs|mil|mobi|museum|name|net|org|pro|tel|tra vel|[A-Z]{2})$"

which will allow any two character domain name (to allow for the country codes)
as well as the currently used specific list of generic top-level domains.



--ron
.



  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default please help.. highly important...

On Sat, 29 May 2010 00:15:01 -0700, yakir
wrote:

Thanks man..


You're welcome. Glad to help.

If you are going to be using my modification, be sure to add a line after
these two in Chip's routine:


Set RegEx = New RegExp
RegEx.Pattern = Pattern

'Add this next line:
RegEx.IgnoreCase = True

--ron
  #7   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default please help.. highly important...

On Mon, 24 May 2010 07:21:17 -0500, Chip Pearson wrote:

In VBA insert a new module. Then go to the Tools menu, choose
References, and scroll down to and check "Microsoft VBScript Regular
Expressions 5.5". Then, use code like the following

Sub AAA()
Dim LastRow As Long
Dim RowNdx As Long
Dim TopRow As Long
Dim WS As Worksheet
Dim Pattern As String
Dim RegEx As RegExp
Const ADDR_COL = "A" '<<<< CHANGE
TopRow = 1 '<<<< CHANGE
Pattern = "(^[a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}" & _
"\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})$"
Set RegEx = New RegExp
RegEx.Pattern = Pattern
Set WS = ActiveSheet
With WS
LastRow = .Cells(.Rows.Count, ADDR_COL).End(xlUp).Row
For RowNdx = LastRow To TopRow Step -1
If RegEx.Test(.Cells(RowNdx, ADDR_COL).Text) = False Then
.Rows(RowNdx).Delete
End If
Next RowNdx
End With
End Sub


Change the lines marked with <<<< to the correct values. ADDR_COL
should be the column letter of the column containing the email
addresses to test. TopRow should be the row number of the top of the
range of values to test. The code begins testing the addresses on the
row calculated as LastRow and moves upwards until it reaches TopRow,
at which point it quits. For each row between LastRow and TopRow
(inclusive), if an invalid email address is found in ADDR_COL, that
row is deleted.

Cordially,
Chip Pearson
Microsoft Most Valuable Professional,
Excel, 1998 - 2010
Pearson Software Consulting, LLC
www.cpearson.com





It's tough to devise a regex that can comply with the relevant internet
standards. I think the current one is RFC5322

Your regex excludes some email addresses which I believe are valid, and
includes some which are invalid.

For example, your regex excludes any TLD's that have more than four characters.
But the current generic domain list includes .museum & .travel, and there will
likely be more, longer lettered TLD's in the future.

It also excludes some characters that are valid in email addresses, at least in
the local part:

! # $ % & ' * + - / = ? ^ _ ` { | } ~

It allows consecutive dots in the local part of the address, as well as
allowing them at the start or end of the local part of the address -- and that
is not valid.

Since the OP is looking for malformed email addresses, I'd be concerned that
your regex would eliminate some valid one's, and include some invalid one's.

It can be pretty tough to devise something that does not have false positives
or negatives. If IP addresses don't need to be included, I'd suggest something
(admittedly incomplete) like (with the case insensitive option (ignorecase)
equal to true):


"^[-\w.%+']+@[A-Z0-9.-]+\.(?:aero|asia|biz|cat|com|coop|edu|gov|info|int| " _
& "jobs|mil|mobi|museum|name|net|org|pro|tel|tra vel|[A-Z]{2})$"

which will allow any two character domain name (to allow for the country codes)
as well as the currently used specific list of generic top-level domains.



--ron
--ron
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
MMult function . Highly frustrating experience please help. BEETAL Excel Programming 2 September 8th 08 05:02 PM
Highly specific Exell Problem Kataklist Excel Programming 4 July 15th 08 08:37 AM
A highly profitable system? n1x5icbk[_2_] Excel Discussion (Misc queries) 0 May 9th 08 07:57 AM
Highly Complex Totals of Data PaulW Excel Discussion (Misc queries) 5 December 28th 06 05:05 PM
How to make a forecast for a highly seasonal business? [email protected] Excel Discussion (Misc queries) 1 January 25th 05 08:37 AM


All times are GMT +1. The time now is 06:30 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"