Home |
Search |
Today's Posts |
|
#1
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
i have a excel .CSV file that Gmail generated for me with my contacts in..
what i want to do is function that go over all the contacts emails one by one and deletes all missformated addresses that cannot be use to send email for them.. somthing like : if (the cell's text is not an valid email address) { delete the cell ; } to be honest i dont know much about excel so i need somthing from scratch like how to put the function in the file and that.. thank you all very much.. |
#2
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Also posted in public.excel.misc
"yakir" wrote in message ... i have a excel .CSV file that Gmail generated for me with my contacts in.. what i want to do is function that go over all the contacts emails one by one and deletes all missformated addresses that cannot be use to send email for them.. somthing like : if (the cell's text is not an valid email address) { delete the cell ; } to be honest i dont know much about excel so i need somthing from scratch like how to put the function in the file and that.. thank you all very much.. |
#3
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
In VBA insert a new module. Then go to the Tools menu, choose
References, and scroll down to and check "Microsoft VBScript Regular Expressions 5.5". Then, use code like the following Sub AAA() Dim LastRow As Long Dim RowNdx As Long Dim TopRow As Long Dim WS As Worksheet Dim Pattern As String Dim RegEx As RegExp Const ADDR_COL = "A" '<<<< CHANGE TopRow = 1 '<<<< CHANGE Pattern = "(^[a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}" & _ "\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})$" Set RegEx = New RegExp RegEx.Pattern = Pattern Set WS = ActiveSheet With WS LastRow = .Cells(.Rows.Count, ADDR_COL).End(xlUp).Row For RowNdx = LastRow To TopRow Step -1 If RegEx.Test(.Cells(RowNdx, ADDR_COL).Text) = False Then .Rows(RowNdx).Delete End If Next RowNdx End With End Sub Change the lines marked with <<<< to the correct values. ADDR_COL should be the column letter of the column containing the email addresses to test. TopRow should be the row number of the top of the range of values to test. The code begins testing the addresses on the row calculated as LastRow and moves upwards until it reaches TopRow, at which point it quits. For each row between LastRow and TopRow (inclusive), if an invalid email address is found in ADDR_COL, that row is deleted. Cordially, Chip Pearson Microsoft Most Valuable Professional, Excel, 1998 - 2010 Pearson Software Consulting, LLC www.cpearson.com On Mon, 24 May 2010 03:05:01 -0700, yakir wrote: i have a excel .CSV file that Gmail generated for me with my contacts in.. what i want to do is function that go over all the contacts emails one by one and deletes all missformated addresses that cannot be use to send email for them.. somthing like : if (the cell's text is not an valid email address) { delete the cell ; } to be honest i dont know much about excel so i need somthing from scratch like how to put the function in the file and that.. thank you all very much.. |
#4
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Mon, 24 May 2010 07:21:17 -0500, Chip Pearson wrote:
In VBA insert a new module. Then go to the Tools menu, choose References, and scroll down to and check "Microsoft VBScript Regular Expressions 5.5". Then, use code like the following Sub AAA() Dim LastRow As Long Dim RowNdx As Long Dim TopRow As Long Dim WS As Worksheet Dim Pattern As String Dim RegEx As RegExp Const ADDR_COL = "A" '<<<< CHANGE TopRow = 1 '<<<< CHANGE Pattern = "(^[a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}" & _ "\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})$" Set RegEx = New RegExp RegEx.Pattern = Pattern Set WS = ActiveSheet With WS LastRow = .Cells(.Rows.Count, ADDR_COL).End(xlUp).Row For RowNdx = LastRow To TopRow Step -1 If RegEx.Test(.Cells(RowNdx, ADDR_COL).Text) = False Then .Rows(RowNdx).Delete End If Next RowNdx End With End Sub Change the lines marked with <<<< to the correct values. ADDR_COL should be the column letter of the column containing the email addresses to test. TopRow should be the row number of the top of the range of values to test. The code begins testing the addresses on the row calculated as LastRow and moves upwards until it reaches TopRow, at which point it quits. For each row between LastRow and TopRow (inclusive), if an invalid email address is found in ADDR_COL, that row is deleted. Cordially, Chip Pearson Microsoft Most Valuable Professional, Excel, 1998 - 2010 Pearson Software Consulting, LLC www.cpearson.com It's tough to devise a regex that can comply with the relevant internet standards. I think the current one is RFC5322 Your regex excludes some email addresses which I believe are valid, and includes some which are invalid. For example, your regex excludes any TLD's that have more than four characters. But the current generic domain list includes .museum & .travel, and there will likely be more, longer lettered TLD's in the future. It also excludes some characters that are valid in email addresses, at least in the local part: ! # $ % & ' * + - / = ? ^ _ ` { | } ~ It allows consecutive dots in the local part of the address, as well as allowing them at the start or end of the local part of the address -- and that is not valid. Since the OP is looking for malformed email addresses, I'd be concerned that your regex would eliminate some valid one's, and include some invalid one's. It can be pretty tough to devise something that does not have false positives or negatives. If IP addresses don't need to be included, I'd suggest something (admittedly incomplete) like (with the case insensitive option (ignorecase) equal to true): "^[-\w.%+']+@[A-Z0-9.-]+\.(?:aero|asia|biz|cat|com|coop|edu|gov|info|int| " _ & "jobs|mil|mobi|museum|name|net|org|pro|tel|tra vel|[A-Z]{2})$" which will allow any two character domain name (to allow for the country codes) as well as the currently used specific list of generic top-level domains. --ron |
#5
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Thanks man..
"Ron Rosenfeld" wrote: On Mon, 24 May 2010 07:21:17 -0500, Chip Pearson wrote: In VBA insert a new module. Then go to the Tools menu, choose References, and scroll down to and check "Microsoft VBScript Regular Expressions 5.5". Then, use code like the following Sub AAA() Dim LastRow As Long Dim RowNdx As Long Dim TopRow As Long Dim WS As Worksheet Dim Pattern As String Dim RegEx As RegExp Const ADDR_COL = "A" '<<<< CHANGE TopRow = 1 '<<<< CHANGE Pattern = "(^[a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}" & _ "\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})$" Set RegEx = New RegExp RegEx.Pattern = Pattern Set WS = ActiveSheet With WS LastRow = .Cells(.Rows.Count, ADDR_COL).End(xlUp).Row For RowNdx = LastRow To TopRow Step -1 If RegEx.Test(.Cells(RowNdx, ADDR_COL).Text) = False Then .Rows(RowNdx).Delete End If Next RowNdx End With End Sub Change the lines marked with <<<< to the correct values. ADDR_COL should be the column letter of the column containing the email addresses to test. TopRow should be the row number of the top of the range of values to test. The code begins testing the addresses on the row calculated as LastRow and moves upwards until it reaches TopRow, at which point it quits. For each row between LastRow and TopRow (inclusive), if an invalid email address is found in ADDR_COL, that row is deleted. Cordially, Chip Pearson Microsoft Most Valuable Professional, Excel, 1998 - 2010 Pearson Software Consulting, LLC www.cpearson.com It's tough to devise a regex that can comply with the relevant internet standards. I think the current one is RFC5322 Your regex excludes some email addresses which I believe are valid, and includes some which are invalid. For example, your regex excludes any TLD's that have more than four characters. But the current generic domain list includes .museum & .travel, and there will likely be more, longer lettered TLD's in the future. It also excludes some characters that are valid in email addresses, at least in the local part: ! # $ % & ' * + - / = ? ^ _ ` { | } ~ It allows consecutive dots in the local part of the address, as well as allowing them at the start or end of the local part of the address -- and that is not valid. Since the OP is looking for malformed email addresses, I'd be concerned that your regex would eliminate some valid one's, and include some invalid one's. It can be pretty tough to devise something that does not have false positives or negatives. If IP addresses don't need to be included, I'd suggest something (admittedly incomplete) like (with the case insensitive option (ignorecase) equal to true): "^[-\w.%+']+@[A-Z0-9.-]+\.(?:aero|asia|biz|cat|com|coop|edu|gov|info|int| " _ & "jobs|mil|mobi|museum|name|net|org|pro|tel|tra vel|[A-Z]{2})$" which will allow any two character domain name (to allow for the country codes) as well as the currently used specific list of generic top-level domains. --ron . |
#6
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Sat, 29 May 2010 00:15:01 -0700, yakir
wrote: Thanks man.. You're welcome. Glad to help. If you are going to be using my modification, be sure to add a line after these two in Chip's routine: Set RegEx = New RegExp RegEx.Pattern = Pattern 'Add this next line: RegEx.IgnoreCase = True --ron |
#7
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Mon, 24 May 2010 07:21:17 -0500, Chip Pearson wrote:
In VBA insert a new module. Then go to the Tools menu, choose References, and scroll down to and check "Microsoft VBScript Regular Expressions 5.5". Then, use code like the following Sub AAA() Dim LastRow As Long Dim RowNdx As Long Dim TopRow As Long Dim WS As Worksheet Dim Pattern As String Dim RegEx As RegExp Const ADDR_COL = "A" '<<<< CHANGE TopRow = 1 '<<<< CHANGE Pattern = "(^[a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}" & _ "\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})$" Set RegEx = New RegExp RegEx.Pattern = Pattern Set WS = ActiveSheet With WS LastRow = .Cells(.Rows.Count, ADDR_COL).End(xlUp).Row For RowNdx = LastRow To TopRow Step -1 If RegEx.Test(.Cells(RowNdx, ADDR_COL).Text) = False Then .Rows(RowNdx).Delete End If Next RowNdx End With End Sub Change the lines marked with <<<< to the correct values. ADDR_COL should be the column letter of the column containing the email addresses to test. TopRow should be the row number of the top of the range of values to test. The code begins testing the addresses on the row calculated as LastRow and moves upwards until it reaches TopRow, at which point it quits. For each row between LastRow and TopRow (inclusive), if an invalid email address is found in ADDR_COL, that row is deleted. Cordially, Chip Pearson Microsoft Most Valuable Professional, Excel, 1998 - 2010 Pearson Software Consulting, LLC www.cpearson.com It's tough to devise a regex that can comply with the relevant internet standards. I think the current one is RFC5322 Your regex excludes some email addresses which I believe are valid, and includes some which are invalid. For example, your regex excludes any TLD's that have more than four characters. But the current generic domain list includes .museum & .travel, and there will likely be more, longer lettered TLD's in the future. It also excludes some characters that are valid in email addresses, at least in the local part: ! # $ % & ' * + - / = ? ^ _ ` { | } ~ It allows consecutive dots in the local part of the address, as well as allowing them at the start or end of the local part of the address -- and that is not valid. Since the OP is looking for malformed email addresses, I'd be concerned that your regex would eliminate some valid one's, and include some invalid one's. It can be pretty tough to devise something that does not have false positives or negatives. If IP addresses don't need to be included, I'd suggest something (admittedly incomplete) like (with the case insensitive option (ignorecase) equal to true): "^[-\w.%+']+@[A-Z0-9.-]+\.(?:aero|asia|biz|cat|com|coop|edu|gov|info|int| " _ & "jobs|mil|mobi|museum|name|net|org|pro|tel|tra vel|[A-Z]{2})$" which will allow any two character domain name (to allow for the country codes) as well as the currently used specific list of generic top-level domains. --ron --ron |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
MMult function . Highly frustrating experience please help. | Excel Programming | |||
Highly specific Exell Problem | Excel Programming | |||
A highly profitable system? | Excel Discussion (Misc queries) | |||
Highly Complex Totals of Data | Excel Discussion (Misc queries) | |||
How to make a forecast for a highly seasonal business? | Excel Discussion (Misc queries) |