Creating a Generic Site-To-Rss Tool

Writing the XML

Writing the RSS Feed to Either a File or an In-memory Stream

We want our class to have the ability to either write the XML it creates to a file or to a string which is returned back. Why would we want to build an In-memory string rather than a file? If we were to create an .aspx page that returns that XML, it is much easier to build a string in-memory than to write that XML to a file. That's because writing to a file from ASP.NET not only requires more security permissions, but you also have to deal with situations where the file might be accessed concurrently from multiple sessions, a task which is more of a hassle than a blessing for our needs.

So a Web client wants to call the GetRss() method, which returns a string, and a WinForms client might want to write to a file, to which it could call the WriteRss() method. Internally though, both methods refer to the same internal implementation, an overloaded WriteRss() method that accepts a TextWriter object to which it writes out the XML. The difference is that the GetRss() method calls this method with a TextWriter that sits on top of a MemoryStream object, while the GetRss() method calls it with a TextWriter that sits on top of a StreamWriter to a direct file. Here's the code for the generic WriteRssMethod:

    '''
    '''    Writes the resolved RSS feed to a text writer
    '''    and returns the text that was written (if it was written to a file)
    '''
    Public Overloads Function WriteRSS(ByVal txWriter As TextWriter,
    ByVal closeAfterFinish As Boolean) As String

        ParseHtml()
        Dim found As MatchCollection = m_FoundRegex.Matches(DownloadedHtml)
        Dim tr As New StringWriter

        Dim writer As New XmlTextWriter(txWriter)
        writer.Formatting = Formatting.Indented

        WritePrologue(writer)

        'write the individual news items
        For Each aMatch As Match In found
            Dim link As String = LinksPrefix & aMatch.Groups("link").Value
            Dim title As String = aMatch.Groups("title").Value
            Dim description As String = aMatch.Groups("description").Value
            Dim pubDate As String = DateTime.Parse(aMatch.
        Groups("pubDate").Value).ToString("r")
            Dim subject As String = aMatch.Groups("category").Value

            AddRssItem(writer, title.Trim(), link, description, pubDate, subject)
        Next

        ''finish all tags
        writer.WriteEndDocument()
        writer.Flush()

        Dim strResult As String = String.Empty

        If closeAfterFinish Then
            writer.Close()

            'return the result that was written
            'if this was written to a file
            Try
                Dim sr As StreamReader = File.OpenText(FileName)
                strResult = sr.ReadToEnd()
                sr.Close()

            Catch ex As Exception
            End Try
        End If

        Return strResult
    End Function

One thing of note in this method is that it has a flag that tells it whether it should close the text writer after finishing writing to it. This implementation detail is important because when writing to a memory stream, the Calling method wants to keep the stream open after the call, so it can then retrieve the text inside that stream and return it as the result. For the other function, which writes to a file, this is of no importance, so the flag is passed as true. The WriteRss() method reads the XML file that was written and returns that result string.

Working with a MemoryStream and the Case for XML Encoding

Writing the XML directly to a string in-memory proved to be rather tricky. The XmlTextWriter has three contructors:

  • XmlTextWriter(fileName as String,encoding as System.Text.Encoding)
  • XmlTextWriter(w as System.IO.Stream, encoding as System.Text.Encoding)
  • XmlTextWriter(w as System.IO.TextWriter)

In our case, it is very important that we are able to specify the encoding manually. I'll explain why.

When I first approached writing to an in-memory string using the XmlTextWriter, my initial instinct guided me towards creating a StringWriter object to send to the constructor number three on our list. So, a call would look something like this:

Dim sb as new StringBuilder()
Dim writer as XmlTextWriter  = new XmlTextWriter(new StringWriter(sb))
‘write the xml using the writer
..
Writer.Close()
Return sb.ToString()

There's a major problem with this code, although it seems to work perfectly. The problem is that internally, all strings are represented as UTF-16 encoded strings. As a result, the StringBuilder object, and the StringWriter object output XML that contains an encoding tag with an encoding of type UTF-16. This is a problem, because if we want our XML string to be parse-able using our RSS readers, this XML needs to be encoded as a UTF-8 string. Using the constructor as shown above does not give us the option, not even after the declaration, to change the encoding with which the XML is encoded as.

So, to solve this we are left with the other two constructors that do enable us to specify the encoding of the output. The first constructor that accepts a file name needs no explanation, but the second one is the one we use to write to an In-memory string, and it is the one that causes the most problems.

Here's the “GetRss()” method again:

Public Overloads Function GetRss() As String
        Dim ms As New MemoryStream
        Dim sr As New StreamWriter(ms, Encoding.UTF8)

        'We send "false" to signal the method to not close the stream automatically in the end
        'we need to close the stream manually so we can get its length
        WriteRSS(sr, False)
        Try

            ''we need to explicitly state the length
            'of the buffer we want
            'otherwise we'll get a string as long as ms.capacity
            'instead of the actual length of the string inside
            Dim iLen As Long = ms.Length
            Dim retval As String = _
                Encoding.UTF8.GetString(ms.GetBuffer(), 0, iLen)

            sr.Close()
            Return retval

        Catch ex As Exception
            Return ex.ToString()

        End Try

End Function

Because we can't use a StringWriter object to send to WriteRss(), we're left with the option of sending in a StreamWriter. We need to initialize it with a stream that is written in memory: MemoryStream. The first part of the method is rather easy; we initialize the StreamWriter and send it over to WriterRss. The trouble begins after that. How do you retrieve the text inside a MemoryStream? Well, conveniently enough, it has a GetBuffer() method, which returns the byte array of the stream's contents. We now just need to transform this byte array into a string encoded as UTF-8. To that end I use the System.Text.Encoding.UTF8.GetEncoding(byte()) method, which does exactly that. System.Text.Encoding.UTF8.GetEncoding() has two overloads:

  • System.Text.Encoding.UTF8.GetEncoding(array as byte())
  • System.Text.Encoding.UTF8.GetEncoding(array as byte(),index as Integer, count as Integer)

Why am I using the more complex version then? It seems perfectly reasonable to call the first one, right? Wrong. When I used the first version of the method, my resulting XML string contained garbage at the end of it. It contained mostly garbled and mangled spaces, which wreaked havoc in IE when trying to view it, but it seemed to work fine using my aggregator. However, the feed failed to validate using www.FeedValidator.org. It said something about a “Missing token” and pointed at the last line of the XML. I could not figure it and was stuck. In fact, I didn't even think about trying to use the other overload of this method, until I got a little help from Mike Gunderloy. The problem was that the GetBuffer() method of the stream returns the contents of the stream in full (up to the length of the Stream.Capacity property). Since streams in the CLR use a paging mechanism, the capacity of a stream is almost always larger than the current contents of the stream, so the resulting output is contents + garbage. Using the overloaded method, I could get the exact contents of the stream by specifying the length of the stream as the length of the output I want to extract from it. The Length property contains the actual length of the data.

This is also why I had to signal my WriteRss() implementation to not close the stream after finishing the writing. Had I closed the stream, I could not have gotten the length of the current stream, which I needed to retrieve the results.

You might also like...

Comments

About the author

Roy Osherove Israel

Roy Osherove has spent the past 6+ years developing data driven applications for various companies in Israel. He's acquired several MCP titles, written a number of articles on various .NET topic...

Interested in writing for us? Find out more.

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“An idiot with a computer is a faster, better idiot” - Rich Julius