Converting HTML to Text

Whether you want to convert an HTML page into pure text so you can parse out that special piece of information, or you simply want to load a page from the Net into your own word processing package, this mini function could come in handy.

It’s called StripTags and accepts an HTML string. Using a regular expression, it identifies all <tags>, removes them, and returns the modified string. Here’s the code:

Public Function StripTags(ByVal HTML As String) As String
    ' Removes tags from passed HTML
    Dim objRegEx As _
        System.Text.RegularExpressions.Regex
    Return objRegEx.Replace(HTML, "<[^>]*>", "")
End Function

Here’s a simple example demonstrating how you could use this function in code (see Figure 7-2 for my sample application):

strData = StripTags("<body><b>Welcome!</b></body>")

I admit, it doesn’t look like much, but this little snippet can be a true lifesaver, especially if you’ve ever tried doing it yourself using Instr and Mid statements. Have fun!

You might also like...

Comments

Karl Moore

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“There are only two kinds of languages: the ones people complain about and the ones nobody uses” - Bjarne Stroustrup