tiistaina, toukokuuta 15, 2007

Push vs. Pull

I have recently started using Twitter together with my collegues at work.
(For those who have spent last few months under a rock: Twitter is a social-web-2.0-service/miniblog where you can send max. 140 character messages and your Twitter-using-friends will see on their page your message. The stated purpose for these messages is for people to tell where they are and what are they doing so their friends don't have to call/email and ask what's up, when are you going to lunch. Sounds simple, eh? The real trick is that you can send messages to Twitter by IM and SMS and also receive updates from your friends the same way. It sure is better than hitting Reload on your Twitter home page. Being able to update and receive updates while on the road is cool.)
There are 4 ways to use Twitter:
  1. Keep the Twitter home page open in a web browser. The page reloads every few minutes. (Until it fails to reload once, after which the automatic reload stops completely. Most annoying.) The home page shows last few updates from you and your friends and has a text box for posting a new message (with a nice prompt: "What are you doing?")
  2. Subscribe the RSS feed of the home page to your feed reader app. This method is, of course, read-only. There is no way to post new messages.
  3. Configure Twitter to know your IM handle. Then add Twitter to your IM app buddy list. The Twitter buddy sends you new messages from your friends and you can post new messages to Twitter by chatting.
  4. SMS messages on your mobile phone.
The first 2 methods are Pull. The last 2 are Push.

With Pull, you have to poll Twitter to find out if anything new has happened. If you poll often, you cause unnecessary load on Twitter's servers (which seem to be struggling with the load as it is.) If you pull seldom, your friends may wonder what's the matter with you because you don't answer. Slow and heavy.

Push is better. There is no need to poll. Twitter sends the updates to you immediately when it has new messages. Fast and light.

On the other hand, Push method needs you to be connected in order to receive. With Pull, you can be behind firewalls or even offline and come online only to poll.

I wonder which is better for sending updates from server to clients? Would it be better for clients to poll the server or to be always connected to the server and server would push new data to clients when it wants?

Pull can be made lighter by e.g. using a single UDP packet and only use the more-demanding HTTP over TCP when the first UDP query returns information that something new is available. Another nice trick is used by the ClamAV virus scanner: it uses the DNS system to carry the version number of their latest database. A simple DNS query will reveal if the new database needs to be downloaded.

On the other hand, could you keep TCP connections open to the clients to implement Push? How much memory does an idle TCP connection consume? Are there limits to how many clients can be connected to a single port?

Then there is the wacky idea of doing Push by making the server connect to the clients. I used to work on a system that worked like that. It has its benefits, but it needs lots of smarts in the clients, who have to be able to be able to deal with firewalls and set up NAT traversal/port forwarding by some automatic way (SOCKS, UPnP, Bonjour).

I have no answers at this time. The world in general seems to be very much Pull-oriented these days with the Web and RSS feeds and its related protocol, HTTP. On the other hand, Push is also widely used, but mainly in instant messaging systems like IRC, MSN, AIM, ICQ, Skype, and others.

2 kommenttia:

Juha Autero kirjoitti...

You forgot 5th (or actually 3rd) method: Use Twitter API. And yes, my biggest gripe with it is that it's push, not pull.

Juha Autero kirjoitti...

Sorry, meant pull, not push.