Regex - Matching URLS

  • 13 years ago

    Hi,
    I am using the following function to match any URLS from within a string containing the html of a webpage:

    public List DumpHrefs(String inputString)
            {
                Regex r;
                Match m;
                List LstURLs = new List();
    
                r = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))",
                    RegexOptions.IgnoreCase | RegexOptions.Compiled);
                for (m = r.Match(inputString); m.Success; m = m.NextMatch())
                {
                   LstURLs.Add(m.Groups[1].ToString());
                }
                return LstURLs;
            }
    

     

    However the problem with this, is it returns all links on the page, and I only wish to return fully qualified links such as http://www.domain.com/page.html and not relitive links.
    Does anyone know how I can modfy my regex to do so?
    Regards

Post a reply

No one has replied yet! Why not be the first?

Sign in or Join us (it's free).

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“A computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are, in short, a perfect match” - Bill Bryson