William Duffy

Glasgow Based C# ASP.NET Web Developer

ASP.NET’s “Internet Url” Regular Expression Validator

The regular expression validator is a great tool when a fast, robust, content validator is required. Drop it on the page, choose it’s ControlToValidate and give it an expression to match against the specified control’s content. There are even built in expressions to choose from….one of which is “Internet URL”.

The problem with this is that the default Internet URL (see below) has a few shortfalls.
http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?

  1. Http or Https are required, when a lot of internet users don’t even know this protocol exists, as they think every url begins with www.
  2. Some crazy urls (Visit Scotland I’m looking at you!) can have other characters not supported by this expression in them (i.e , or ;) .

So in order to get around these little issues I altered the expression a little (see below).
(http(s)?://)?([\w-]+\.)+[\w-]+(/[\w- ;,./?%&=]*)?
This offers a wider support for URL’s, but there is a small gotcha that you need to be aware of. So what is the gotcha? If the user adds a url without http or https then an anchor with that url applied will break because it will be calling an external url, which requires the protocol; otherwise it is handled as a relative link by your browser. The solution to this is simple. Check for a protocol when saving the value. (note: we can’t possibly know if a url is supposed to be http or https so if a user doesn’t enter a protocol then we will prepend http. If they did enter a protocol then nothing will be prepended)

1
      entity.Url = StringTools.EnsureProtocol(txtUrl.Text, StringTools.Protocol.Http);

Below is my helper method for doing this, but you can easily write the check inline if you wish. Of course, I recommend you always extract your common code into reusable methods, but that’s another discussion!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
///<summary>
///Defines the type of prefix protocol
///</summary>
///<remarks></remarks>
public enum Protocol
{
Http,
Https,
Ftp,
Smtp,
Pop,
Mail
}
 
///<summary>
///Assesses a url and returns a url that is guaranteed to begin with the specified protocol
///</summary>
///<param name="url">The url to be checked for the protocol</param>
///<returns>A url that is guaranteed to begin with the specified protocol</returns>
///<remarks></remarks>
public static string EnsureProtocol(string url, Protocol protocol)
{
string output = url;
 
if (!string.IsNullOrEmpty(output) && !output.StartsWith(protocol + "://", StringComparison.OrdinalIgnoreCase))
output = string.Format("{0}://{1}", protocol.ToString().ToLower(), url);
 
return output;
}

Happy Coding!


Tagged as , , + Categorized as ASP.NET, C#

11 Comments

  1. Hows about making some tutorials that c# noobs like me can dive into? :)

  2. Sure thing Jamie, I have a few ideas for beginners tutorials. Subscribe to the RSS feed to keep an eye out for them getting posted.

  3. Awsome, thats a really cool/helpful idea, also the snippit at the end could be mighty useful.

  4. Excellent article, great job with this regex. I am using this:
    (?\w+):\/\/(?[\w@][\w.:@]+)\/?[\w\.?=%&=\-@/$,]*|(http(s)?://)?([\w-]+\.)+[\w-]+(/[\w- ;,./?%&=]*)?

    I took the regular URL and then added yours onto the end, so if http:// is specified, then the pattern is matched, but if the user doesn’t use a protocol, your pattern has a chance to find a match. Between the two it’s working very well.

  5. this regular expression is not fully correct…
    http://www.wduffy.co.uk/2009/04/11/aspnets-website-url-regular-expression-validator/

    bec it accepts more the 3 www ‘s in the address.. which is wrong. it should accept only 3 www in the url..

    eg. -www.yahoo.com ——which is correct

    wwww.yahoo.com —–is wrong…

    thanks..
    waiting for the reply.

  6. Hi Trevor. The example you gave is not a fault, it is by design and perfectly legal. www is nothing more than a subdomain. www and wwww are both legal subdomains, as is a url with no subdomain. If you were to force www then validating against any other subdomains would not be possible. For example, http://www.google.co.uk is legal, so is google.co.uk, and news.google.co.uk. There can even be subdomains of subdomains, for example finance.news.google.co.uk. All of which the validator can match successfully.

  7. hi,

    i am looking for expression to validate the URL as below.

    http://www.Test.com.in.net

    i tired the expression , but unable to validate.

  8. Hi Murthy, I ran the expression through RegexBuddy with your url and it matched ok. I’m not sure why you are not getting a match. Perhaps try it without the capital T to see if you have any more luck though that shouldn’t be an issue as the expression uses the word character quantifier \w which is the equivalent of [a-zA-Z_0-9].

  9. I want to validate both internet and intranet URLs. But it is failing for intranet URLs such as

    http://local
    http://local/finance/

    Any ideas?

  10. @NLV I would use an OR operator in the expression to match either. The following should handle your scenario. ((http(s)?://)?([\w-]+\.)+[\w-]+(/[\w- ;,./?%&=]*)?|(http(s)?://local/)([\w-/]*)?) . The identification of your local urls is performed by the (http(s)?://local/)([\w-/]*)? part of the expression.

  11. Thank you for regexp

comment_type == "trackback" || $comment->comment_type == "pingback" || ereg("", $comment->comment_content) || ereg("", $comment->comment_content)) { ?>

Trackbacks & Pingbacks

  1. Jamie

    Hows about making some tutorials that c# noobs like me can dive into? :)

  2. comment_type == "trackback" || $comment->comment_type == "pingback" || ereg("", $comment->comment_content) || ereg("", $comment->comment_content)) { ?>

    Trackbacks & Pingbacks

    1. William

      Sure thing Jamie, I have a few ideas for beginners tutorials. Subscribe to the RSS feed to keep an eye out for them getting posted.

    2. comment_type == "trackback" || $comment->comment_type == "pingback" || ereg("", $comment->comment_content) || ereg("", $comment->comment_content)) { ?>

      Trackbacks & Pingbacks

      1. Gary

        Awsome, thats a really cool/helpful idea, also the snippit at the end could be mighty useful.

      2. comment_type == "trackback" || $comment->comment_type == "pingback" || ereg("", $comment->comment_content) || ereg("", $comment->comment_content)) { ?>

        Trackbacks & Pingbacks

        1. David Berman

          Excellent article, great job with this regex. I am using this:
          (?\w+):\/\/(?[\w@][\w.:@]+)\/?[\w\.?=%&=\-@/$,]*|(http(s)?://)?([\w-]+\.)+[\w-]+(/[\w- ;,./?%&=]*)?

          I took the regular URL and then added yours onto the end, so if http:// is specified, then the pattern is matched, but if the user doesn’t use a protocol, your pattern has a chance to find a match. Between the two it’s working very well.

        2. comment_type == "trackback" || $comment->comment_type == "pingback" || ereg("", $comment->comment_content) || ereg("", $comment->comment_content)) { ?>

          Trackbacks & Pingbacks

          1. Trevor

            this regular expression is not fully correct…
            http://www.wduffy.co.uk/2009/04/11/aspnets-website-url-regular-expression-validator/

            bec it accepts more the 3 www ‘s in the address.. which is wrong. it should accept only 3 www in the url..

            eg. -www.yahoo.com ——which is correct

            wwww.yahoo.com —–is wrong…

            thanks..
            waiting for the reply.

          2. comment_type == "trackback" || $comment->comment_type == "pingback" || ereg("", $comment->comment_content) || ereg("", $comment->comment_content)) { ?>

            Trackbacks & Pingbacks

            1. William

              Hi Trevor. The example you gave is not a fault, it is by design and perfectly legal. www is nothing more than a subdomain. www and wwww are both legal subdomains, as is a url with no subdomain. If you were to force www then validating against any other subdomains would not be possible. For example, http://www.google.co.uk is legal, so is google.co.uk, and news.google.co.uk. There can even be subdomains of subdomains, for example finance.news.google.co.uk. All of which the validator can match successfully.

            2. comment_type == "trackback" || $comment->comment_type == "pingback" || ereg("", $comment->comment_content) || ereg("", $comment->comment_content)) { ?>

              Trackbacks & Pingbacks

              1. Murthy

                hi,

                i am looking for expression to validate the URL as below.

                http://www.Test.com.in.net

                i tired the expression , but unable to validate.

              2. comment_type == "trackback" || $comment->comment_type == "pingback" || ereg("", $comment->comment_content) || ereg("", $comment->comment_content)) { ?>

                Trackbacks & Pingbacks

                1. William

                  Hi Murthy, I ran the expression through RegexBuddy with your url and it matched ok. I’m not sure why you are not getting a match. Perhaps try it without the capital T to see if you have any more luck though that shouldn’t be an issue as the expression uses the word character quantifier \w which is the equivalent of [a-zA-Z_0-9].

                2. comment_type == "trackback" || $comment->comment_type == "pingback" || ereg("", $comment->comment_content) || ereg("", $comment->comment_content)) { ?>

                  Trackbacks & Pingbacks

                  1. NLV

                    I want to validate both internet and intranet URLs. But it is failing for intranet URLs such as

                    http://local
                    http://local/finance/

                    Any ideas?

                  2. comment_type == "trackback" || $comment->comment_type == "pingback" || ereg("", $comment->comment_content) || ereg("", $comment->comment_content)) { ?>

                    Trackbacks & Pingbacks

                    1. William

                      @NLV I would use an OR operator in the expression to match either. The following should handle your scenario. ((http(s)?://)?([\w-]+\.)+[\w-]+(/[\w- ;,./?%&=]*)?|(http(s)?://local/)([\w-/]*)?) . The identification of your local urls is performed by the (http(s)?://local/)([\w-/]*)? part of the expression.

                    2. comment_type == "trackback" || $comment->comment_type == "pingback" || ereg("", $comment->comment_content) || ereg("", $comment->comment_content)) { ?>

                      Trackbacks & Pingbacks

                      1. Oleg

                        Thank you for regexp

                      Leave a Reply