An Introduction To WebSockets

Lately, there has been much talk about the WebSockets API, and how it could change the way web applications are developed. In this article, we will take a look at

  • Why we need WebSockets
  • What WebSockets are
  • What makes them great
  • What it could mean for the future of web applications

By the end of this article, I hope you’ll appreciate just how useful WebSockets are and how to get started rolling your own WebSockets applications.

Why do we need WebSockets?

So why do we need WebSockets? What problem are we trying to solve by using them? The answer is easy. We need a better way for web applications running on a client browser to communicate in real time with their servers. Currently, there are two common methods of providing this.

  • The first is for the application to poll the server continuously for any new data. If there is new data, then that is sent to the client, generally, via AJAX. This is similar to the way some children troll their parents by asking “Are we there yet?” every few seconds when riding in the car with their parents. Much to the parents’ chagrin, they have to answer ‘no’ continuously every few seconds until they finally reach their destination. Polling is just like that, where the application is basically asking whether there is new data after regular intervals of time and the server has to respond back every time, even if there is no new data to give.
  • The second is called ‘Long Polling’. This is a variation of the first technique but instead of the server giving an ‘empty’ response and closing the connection when it has no new data to give, the connection between client and server is kept open (with a timeout period). At some point in the future when the server does have some new data to give, it is given to the client and the connection is closed (provided it does so within the timeout period). This is better than polling in most ways, but if you try to use this approach in applications where a lot of data is generated very fast, then it becomes almost like the polling technique.

Both methods have their merits when compared with each other, but they also share a common set of disadvantages developers could do without.

  • Both use the HTTP protocol to send messages to the server. Every packet of information sent over this protocol is wrapped in a lot of header information which describes things like where is this packet heading, where it came from, the user agent information etc. All of this adds a lot of overhead when communicating in real time.
  • Neither of these methods are ‘bi-directional full duplex’ where both client and server can send and receive each other’s messages at the exact same time like, for example, a telephone system, where the people at both ends can talk and hear at the same time.

These are the reasons current techniques are not good enough for fast, scalable real time communication on the web. We need a better solution, and that is what WebSockets gives us.

What are WebSockets?

WebSockets are a new way for clients to communicate to servers and vice versa, without the overhead of an HTTP protocol. It uses its own protocol, which is defined by the IETF. The latest version is RFC 6455. Previous versions of the protocol proved to have some security issues so while they were implemented in a few browsers like Opera, they were not enabled by default. The newest version of the protocol seems have improved on these issue, and browsers are working on supporting that now.

Apart from having its own protocol, it also has an API which can be used by web applications to open and close connections and to send and receive messages. This is called the WebSockets API and is defined in a W3C Specification.

With WebSockets you can have full duplex bi-directional communication between the server and the client with less overhead than traditional HTTP based methods. This promises faster, more scalable and more robust high performance real time applications on the web. In fact, according to some analysis by the Kaazing Corporation, it could reduce the size of HTTP header traffic by 500:1 to 1000:1 and reduce network latency by 3:1. That translates to some serious performance improvements, especially for applications requiring fast real-time updates.

How does it work?

Before the client and the server start sending and receiving messages, they need to establish a connection first. This is done by establishing a ‘handshake’, where the client sends out a request to connect, and if the server wants, it will send out a response accepting the connection. The protocol specification makes it clear that one of the design decisions when making this protocol was to ensure that both HTTP based clients and WebSocket based ones can operate on the same port. This is why the handshake is such that the client and server ‘upgrade’ from an HTTP based protocol to a WebSocket based protocol.

The protocol spec has an example of such a handshake. The initiating handshake from the client should look like this:

GET /chat HTTP/1.1
   Host: server.example.com
   Upgrade: WebSocket
   Connection: Upgrade
   Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
   Origin: http://example.com
   Sec-WebSocket-Protocol: chat, superchat
   Sec-WebSocket-Version: 13

and the responding handshake from the server should look like this:

HTTP/1.1 101 Switching Protocols
   Upgrade: WebSocket
   Connection: Upgrade
   Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= 
   Sec-WebSocket-Protocol: chat

Here the client will send a key in the Sec-WebSocket-Key header which is base64 encoded. For a server to form a response, it will take this and append the magic string 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 to it, and then calculate the SHA-1 hash of this string. Then it will encode that hash value to base64, and that will be the sec-WebSocket-Accept header in the server’s response.

In the above example,

  • The client sends the Sec-WebSocket-Key string dGhlIHNhbXBsZSBub25jZQ==
  • The server appends the magic string to form the string dGhlIHNhbXBsZSBub25jZQ== 258EAFA5-E914-47DA-95CA-C5AB0DC85B11
  • Now the server generates the SHA-1 hash for this longer string, which is b37a4f2cc0624f1690f64606cf385945b2bec4ea
  • Finally, the server base64-encodes the hash string to give s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
  • And this base64-encoded value is used in the Sec-WebSocket-Accept header in the server’s response.

An important thing to note is the Origin header. The client-side handshake will always include this header, and then it will be up to the server whether they want to accept clients from different origins or not.

The WebSockets API

Front-end web developers will be more interested in the WebSockets API[2], which is a JavaScript based API which developers will use to do messaging between their client side app and the server.

Does The Browser Support WebSockets?

The first things all developers should do when working with the WebSockets API is to detect whether or not the client browser supports them. If so, we can work our magic with them. If not, we’ll have to fall back to another method of client-server communication, such as long-polling mentioned above.

if ('WebSocket' in window){
   /* WebSocket is supported. You can proceed with your code*/
} else {
   /*WebSockets are not supported. Try a fallback method like long-polling etc*/
}

Opening and Closing WebSocket Connections

Assuming that WebSockets are supported by the browser, the first task will be to connect to a WebSocket server by calling the WebSocket constructor

var connection = new WebSocket('ws://example.org:12345/myapp');

You could also use wss://, which is the secure socket variant to ws:// in the same way https is to http.

var connection = new WebSocket('wss://secure.example.org:67890/myapp');

You could also specify sub-protocols of your own like so:

var connection = new WebSocket('ws://example.org:12345/myapp', ['chat', 'super-awesome-chat']);

If your connection is accepted and established by the server, then an onopen event is fired on the client’s side. You can handle it like so

connection.onopen = function(){
   /*Send a small message to the console once the connection is established */
   console.log('Connection open!');
}

If the connection is refused by the server, or for some other reason is closed, then the onclose event is fired .

connection.onclose = function(){
   console.log('Connection closed');
}
You can even explicitly close it on your own by calling the close() method, like so
connection.close();

In case of any errors, you can handle them using the onerror event.

connection.onerror = function(error){
   console.log('Error detected: ' + error);
}

Sending and Receiving Messages

Once we’ve successfully opened a connection to the server, we need to send messages to and receive messages from the server. Sending messages is very straightforward. We use the .send() method on our connection object.

connection.send('Hey server, whats up?');

Should the client receive a message from the server, it raises the onmessage event for you to handle.

connection.onmessage = function(e){
   var server_message = e.data;
   console.log(server_message);
}

If you want to send JSON objects to the server rather than a simple message, they should be serialized to a string, like so:

var message = {
'name': 'bill murray',
'comment': 'No one will ever believe you'
};
connection.send(JSON.stringify(message));

N.B. The WebSockets specification states that messages can be sent as binary messages using either the blob or the arrayBuffer objects as well as strings. However, not all browsers currently (as of May 2012) support this.

Supporting WebSockets on the Server

Most web servers revolve solely around the HTTP protocol. As WebSockets use their own protocol, you may need to install additional libraries and add-ons to support ws:// or the wss:// protocols in addition to http:// and https://.

Cross browser support: Are we there yet?

The latest version of the WebSocket Protocol (RFC 6455) is currently only supported by a couple of the major browsers (Chrome and Opera) right now. While we wait for the other browsers to catch up however, there are several ways to roll out cross-browser WebSocket-based applications right now.

A nice cross browser way to do JavaScript-based real time communication is socket.io. This works with Node.js and other technologies to create a cross browser way to do real time communication with web applications. It uses WebSockets if the client supports it, and falls back on other things like flash if it’s not, and even has AJAX polling and multi-part streaming in its arsenal. It builds upon various technologies (where WebSockets is one of them) to create an abstraction level which can be used by all clients in a unified cross browser way.

Another way to go would be cloud hosted API services like Pusher or BeaconPush. Instead of rolling out your own WebSocket server, you could use these types of services to run a WebSocket server, and interact on the client side with the API they provide. Generally they provide a flash fallback (which simulates WebSockets) in case the browser does not support WebSockets.

Summary

WebSockets provide a really simple way to do fast, robust and very efficient communication between the client and the server, removing some of the problems we face with the HTTP protocol. This technology is especially suited for applications where there is a high amount of data being generated rapidly and which needs to be communicated quickly. One very good area where this can be used is in the area of HTML5 based multiplayer online games (especially ones where you require quick response times, like first-person shooters). Other possible uses on the web include real-time breaking-news updates, fast updating streams on social media, as well as sport scores and online chat applications.

References

You might also like...

Comments

About the author

Shwetank Dixit

Shwetank Dixit India

Shwetank Dixit works as a Web Evangelist for Opera Software. He likes the idea of a web built with open web standards and writes articles, speaks at conferences and talks to developers about all...

Interested in writing for us? Find out more.

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Theory is when you know something, but it doesn't work. Practice is when something works, but you don't know why. Programmers combine theory and practice: Nothing works and they don't know why.”