Retrieving HTTP content in .NET

New HTTP tools in .NET

The .NET Framework provides new tools for retrieving HTTP content that are powerful and scalable in a single package. If you've ever worked in pre-.NET applications and tried to retrieve HTTP content you probably know that there are a number of different tools available: WinInet (Win32 API), XMLHTTP (part of MSXML) and recently the new WinHTTP COM library. These tools invariably all worked in some situations, but none of them really fit the bill for all instances. For example, WinInet can't scale on the server with no multi-threading support. XMLHTTP was too simple and didn't support all aspects of the HTTP model. WinHTTP which is the latest Microsoft tool for COM solves many of these problems but it doesn't work at all on Win9x, which makes it a bad choice for a client tool integrated into broad distribution apps at least for the moment until XP take a strong hold.

The .NET framework greatly simplifies HTTP access with a pair of classes HttpWebRequest and HttpWebResponse. These classes provide just about all of the functionality provided through the HTTP protocol in a straightforward manner. The basics of returning content from the Web requires very little code (see Listing 1).

Listing 1: Simple retrieval of Web data over HTTP.

string lcUrl = "http://www.west-wind.com/TestPage.wwd";

// *** Establish the request
HttpWebRequest loHttp =
    (HttpWebRequest) WebRequest.Create(lcUrl);

// *** Set properties
loHttp.Timeout = 10000;    // 10 secs
loHttp.UserAgent = "Code Sample Web Client";

// *** Retrieve request info headers
HttpWebResponse loWebResponse = (HttpWebResponse) loHttp.GetResponse();

Encoding enc = Encoding.GetEncoding(1252);  // Windows default Code Page

StreamReader loResponseStream =
  new StreamReader(loWebResponse.GetResponseStream(),enc);

string lcHtml = loResponseStream.ReadToEnd();

loWebResponse.Close();
loResponseStream.Close();

Pretty simple, right? But beneath this simplicity lies a lot of power too. Let's start with looking at how this works.

Start by creating the HttpWebRequest object which is the base object used to initiate a Web request. A call to the static WebRequest.Create() method is used to parse the URL and pass the resolved URL into the request object. This call will throw an exception if the URL passed has invalid URL syntax.

The request portion controls how the outbound HTTP request is structured. As such it handles configuration of the HTTP headers, the most common of which are expressed as properties of the HttpWebRequest object. A few examples are UserAgent, ContentType, Expires and even a Cookies collection which map directly to header values that get set when the response is sent. Headers can also be set explicitly using the Headers string collection to which you can add either a whole header string or a key value pair. Generally the properties address all common headers, so you'll rarely need to resort to setting headers explicitly most likely only to support special protocols (for example, SoapAction for SOAP requests).

In the example, I do nothing much with the request other than setting a couple of the optional properties – the UserAgent (the client 'browser' which is blank otherwise) and the Timeout for the request. If you need to POST data to the server you'll need to do a little more work – I'll talk about this a little later.

Streaming good deals

Once the HTTP Request is configured for sending the data, a call to GetResponse() actually goes out and sends the HTTP request to the Web Server. At this point the request sends the headers and retrieves the first HTTP result buffer from the Web Server.

When the code above performs the GetResponse() call only a small chunk of data is returned from the Web server. The first chunk contains the HTTP header and the very first part of the data, which is simply buffered internally until read from the stream itself. The data from this initial request is used to set the properties of the HttpWebResponse object, so you can look at things like ContentType, ContentLength, StatusCode, Cookies and much more.

Next a stream is returned using the GetResponseStream() method. The stream points at the actual binary HTTP response from the Web server. Streams give you a lot of flexibility in handling how data is retriveved from the web server.

As mentioned, the call to GetResponse() only returned an initial internal buffer – to retrieve the actual data and read the rest of the result document from the Web server you have to read the stream.

In the example above I use a StreamReader object to return a string from the data in a single operation. But realize that because a stream is returned I could access the stream directly and read smaller chunks to say provide status information on progress of the HTTP download.

Notice also that when the StreamReader is created I had to explicitly provide an encoding type – in this case CodePage 1252 which is the Windows default codepage. This is important because the data is transferred as a byte stream and without the encoding it would result in invalid character translations for any extended characters. CodePage 1252 works fairly well for English or European language content, as well as binary content. Ideally though you will need to decide at runtime which encoding to use – for example a binary file probably should write a stream out to file or other location rather than converting to a string, while a page from Japan should use the appropriate Unicode encoding for that language.

StreamReader also exposes the underlying raw stream using the BaseStream property, so StreamReader is a good object to use to pass streamed data around.

POSTing data

The example above only retrieves data which is essentially an HTTP GET request. If you want to send data to the server you can use an HTTP POST operation. POSTing data refers to the process of taking data and sending it to the Web server as part of the request payload. A POST operation both sends data to the server and retrieves a response.

Posting uses a stream to send the data to the server, so the process of posting data is pretty much the reverse of retrieving the data (see listing 2).

Listing 2: POSTing data to the Web Server

string lcUrl = "http://www.west-wind.com/testpage.wwd";
HttpWebRequest loHttp =
  (HttpWebRequest) WebRequest.Create(lcUrl);

// *** Send any POST data
string lcPostData =
  "Name=" + HttpUtility.UrlEncode("Rick Strahl") +
  "&Company=" + HttpUtility.UrlEncode("West Wind ");

loHttp.Method="POST";
byte [] lbPostBuffer = System.Text.         
                      Encoding.GetEncoding(1252).GetBytes(lcPostData);
loHttp.ContentLength = lbPostBuffer.Length;

Stream loPostData = loHttp.GetRequestStream();
loPostData.Write(lbPostBuffer,0,lbPostBuffer.Length);
loPostData.Close();

HttpWebResponse loWebResponse = (HttpWebResponse) loHttp.GetResponse();

Encoding enc = System.Text.Encoding.GetEncoding(1252);

StreamReader loResponseStream =
  new StreamReader(loWebResponse.GetResponseStream(),enc);

string lcHtml = loResponseStream.ReadToEnd();

loWebResponse.Close();
loResponseStream.Close();

Make sure you use the this POST code immediately before the HttpWebRequest.GetResponse() call. All other manipulation of the Request object has no effect as the headers get send with the POST buffer. The rest of the code is identical to what was shown before – You retrieve the Response and then read the stream to grab the result data.

POST data needs to be properly encoded when sent to the server. If you're posting information to a Web page you'll have to make sure to properly encode your POST buffer into key value pairs and using URLEncoding for the values. You can utilize the static method System.Web.HttpUtility.UrlEncode() to encode the data. In this case make sure to include the System.Web namespace in your project. Note this is necessary only if you're posting to a typical HTML page – if you're posting XML or other application content you can just post the raw data as is. This is all much easier to do using a custom class like the one included with this article. This class has an AddPostKey method and depending on the POST mode it will take any parameters and properly encode them into an internally manage stream which is then POSTed to the server.

To send the actual data in the POST buffer the data has to be converted to a byte array first. Again we need to properly encode the string. Using Encoding.GetEncoding(1252) encoding with the GetBytes() method which returns a byte array using the Windows standard ANSI code page. You should then set the ContentLength property so the server can know the size of the data stream coming in. Finally you can write the POST data to the server using an output stream returned from HttpWebRequest.GetRequestStream(). Simply write the entire byte array out to the stream in one Write() method call with the appropriate size of the byte array. This writes the data and waits for completion. As with the retrieval operation the stream operations are what actually causes data to be sent to the server so if you want to provide progress information you can send smaller chunks and provide feedback to the user if needed.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity.” - Dennis Ritchie