Regex - Matching URLS

  • 9 years ago

    Hi,
    I am using the following function to match any URLS from within a string containing the html of a webpage:

    public List DumpHrefs(String inputString)
            {
                Regex r;
                Match m;
                List LstURLs = new List();
    
                r = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))",
                    RegexOptions.IgnoreCase | RegexOptions.Compiled);
                for (m = r.Match(inputString); m.Success; m = m.NextMatch())
                {
                   LstURLs.Add(m.Groups[1].ToString());
                }
                return LstURLs;
            }
    

     

    However the problem with this, is it returns all links on the page, and I only wish to return fully qualified links such as http://www.domain.com/page.html and not relitive links.
    Does anyone know how I can modfy my regex to do so?
    Regards

Post a reply

No one has replied yet! Why not be the first?

Sign in or Join us (it's free).

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Linux is only free if your time has no value” - Jamie Zawinski