Utterances of a Zimboe

Programming the Internet.

Information encoding, revisited

with one comment

Update 9/25: A little follow-up.

(A concrete protocol is defined by schema and encoding. I.e., each version of an XML schema is a new protocol. Understanding this is a prerequisite for reading any further.)

Stefan Tilkov notes about some basic differences between AMQP and XMPP, and especially this one caught my eye:

One of my motivations, though, was that XMPP is based on XML, while AMQP (AFAIK) is binary. This suggests to me that AMQP will probably outperform XMPP for any given scenario — at the cost of interoperability (e.g. with regard to i18n).

Which of course is very correct for sure. XML is a bad choice for a transport protocol for a number of reasons. XML is complex to parse, and too ambiguous. These are not that critical when dealing with documents, but with live communications, all complications tend to escalate sharply.

And that’s exactly why I really would like to see some experimentation with binary encodings—at least for protocols, if not for other uses of XML. It’s only just over a year since I last argued this point. Unfortunately enough, other stuff has taken precedence over my tinkering with encodings.

Quoting myself: ;)

And, as the binary format should be the most efficient way to represent that data, it really isthe leanest ad-hoc binary format.

Ie., when done right (and why wouldn’t it be), EXI/whatever should be very competitive against any proprietary protocol—XML itself shouldn’t set much restrictions on the efficiency of information encoding. Merely vice versa, as when both sides know the respective ‘schema’ (protocol spec, like AMQP), the amount of space used for describing structure in the actual transferred content should be minimal. So, there shouldn’t be any reason at all to handcraft more transfer formats; room for further optimizations should be quite limited.

I must emphasize the following point as it’s constantly ‘misperceived’: binary encoding is not to compress, but for making parsing more simple and robust. (Well, that’s the most important feature for me, at least.) Naturally, binary encoding helps save some space as you don’t need to verbose the tags all the time, but it certainly doesn’t compress the actual content. Of course, the compression is a good selling point (to the masses), but no sentient being should constrain itself by focusing only on that petty (uninteresting, trivial) property.

Elaborating just a little bit more, the point with binary encoding is exactly the fact that it is just an encoding. That’s completely independent aspect/abstraction from the data model. So, XML has always its extensibility, was it encoded with excess number of tags or some simple, singular octets (or so, ykwim.) I don’t know how extensible the AMQP is, but I’m pretty sure it isn’t extensible in an ‘industry standard way’.


This kind of a rant this time. Take care. :)

Tags: ,


Written by Janne Savukoski

September 18, 2007 at 11:07 pm

Posted in Internet

One Response

Subscribe to comments with RSS.

  1. […] Me: Quoting myself: ;) And, as the binary format should be the most efficient way to represent that data, it really isthe leanest ad-hoc binary format. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: