Converting HTML to Text

This is a comment thread discussing Converting HTML to Text
  • 9 years ago
    This is very helpful.  Additionally, most people will want to do more formatting to replace all linebreaks with a null string, and then to replace all html line breaks (<br>, etc.) with linebreaks so that the end result looks more like the original document.

    Finally, what regexp would you use to get rid of not only the html tags, but text between particular tags?  Specifically i am thinking of <title>Blah blah blah</title>.
  • 6 years ago
    This code snippet ROCKS. EXACTLY what I needed. THANKS!Big Smile [:D]

  • 5 years ago
    I agree it is really a life saver. if anyone wants a c# version, here it is.

    private string StripTags(string HTML)
            {
                // Removes tags from passed HTML           
                System.Text.RegularExpressions.Regex objRegEx = new System.Text.RegularExpressions.Regex("<[^>]*>");

                return objRegEx.Replace(HTML, "");
            }

    Thanks,

    Almino













  • 5 years ago

    Almino,

     I have been given a task to convert an HTML string with tags to text. It appears as if this function will work, but it won't handle all the cases such as <br> and html coded spacing (&nbsp)

     Do you how to modify the existing code to handle this? Your help is appreciated. I am not good at all with regular expressions.  Thanks.

    - Raju

Post a reply

Enter your message below

Sign in or Join us (it's free).

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Computer Science is no more about computers than astronomy is about telescopes.” - E. W. Dijkstra