Information encoding, revisited
Update 9/25: A little follow-up.
(A concrete protocol is defined by schema and encoding. I.e., each version of an XML schema is a new protocol. Understanding this is a prerequisite for reading any further.)
Stefan Tilkov notes about some basic differences between AMQP and XMPP, and especially this one caught my eye:
One of my motivations, though, was that XMPP is based on XML, while AMQP (AFAIK) is binary. This suggests to me that AMQP will probably outperform XMPP for any given scenario — at the cost of interoperability (e.g. with regard to i18n).
Which of course is very correct for sure. XML is a bad choice for a transport protocol for a number of reasons. XML is complex to parse, and too ambiguous. These are not that critical when dealing with documents, but with live communications, all complications tend to escalate sharply.
And that’s exactly why I really would like to see some experimentation with binary encodings—at least for protocols, if not for other uses of XML. It’s only just over a year since I last argued this point. Unfortunately enough, other stuff has taken precedence over my tinkering with encodings.
Quoting myself: ;)
And, as the binary format should be the most efficient way to represent that data, it really isthe leanest ad-hoc binary format.
Ie., when done right (and why wouldn’t it be), EXI/whatever should be very competitive against any proprietary protocol—XML itself shouldn’t set much restrictions on the efficiency of information encoding. Merely vice versa, as when both sides know the respective ‘schema’ (protocol spec, like AMQP), the amount of space used for describing structure in the actual transferred content should be minimal. So, there shouldn’t be any reason at all to handcraft more transfer formats; room for further optimizations should be quite limited.
I must emphasize the following point as it’s constantly ‘misperceived’: binary encoding is not to compress, but for making parsing more simple and robust. (Well, that’s the most important feature for me, at least.) Naturally, binary encoding helps save some space as you don’t need to verbose the tags all the time, but it certainly doesn’t compress the actual content. Of course, the compression is a good selling point (to the masses), but no sentient being should constrain itself by focusing only on that petty (uninteresting, trivial) property.
Elaborating just a little bit more, the point with binary encoding is exactly the fact that it is just an encoding. That’s completely independent aspect/abstraction from the data model. So, XML has always its extensibility, was it encoded with excess number of tags or some simple, singular octets (or so, ykwim.) I don’t know how extensible the AMQP is, but I’m pretty sure it isn’t extensible in an ‘industry standard way’.
Summary:
- XML standard encoding: sucks (in protocols, ok in blogging)
- XML data model: rocks
This kind of a rant this time. Take care. :)



[...] Me: Quoting myself: ;) And, as the binary format should be the most efficient way to represent that data, it really isthe leanest ad-hoc binary format. [...]
Processing speed, not compression « Utterances of a Zimboe
November 18, 2007 at 9:18 pm