The Importance of XML

Quentin Stafford-Fraser

I suspect there may be a great deal of confusion amongst non-technical people about what it is, exactly, that the much-hyped XML gives you. There's no shortage of confusion amongst technical people, too. So here's my attempt to summarise why it is so important.

XML is a standard for describing how information is structured. This makes it much easier to move structured information from place to place, or from one program to another. However, what it doesn't do is specify how information should be structured, beyond a few basic precepts.

So it doesn't (as some would have you believe) solve all the world's problems. If my invoicing system has a very different idea from yours, for example, about the information that should be held on a given customer, then putting it into XML is not going to magically fill in the missing bits. Similarly, if you store a customer's full name, and I want to store first name, middle initial, and surname, then XML will not tell you how to split up your customer's names. Data conversion between formats can still be fiddly and tedious.

However, if you dismiss XML because of this, then you miss an important part of the picture. XML is a standard language for talking about data. It performs a very similar function to ASCII (a standard way of encoding characters as bytes) or SQL (a standard way of asking questions of a database). Having it is an important step on the ladder. Many years ago there was a time, before ASCII, when any text transmitted as a sequence of bytes from one computer to another might not be understood because the two computers didn't agree on which number should represent the letter 'A'. Thankfully those days are now largely behind us. Programmers no longer have to think about the conversions before they even start looking at the contents.

XML performs a similar function when exchanging data which has a structure more complex than simply a string of text. There are standard instructions that XML programmers can use to find the third sub-item of a group, in much the same way that there are standard calls to find the length of a piece of ASCII text. Programmers no longer have to think about how to decode the structure before they even look at the contents. XML certainly isn't perfect, and there will always be situations where system designers will want to create their own custom-built alternative. But in the past they had little choice. Now, you can just use XML unless you have a good reason not to. And if you do, a whole world of tools, utilities, conventions, and libraries of code are available to help you along.

XML has another important function, though. While it doesn't specify how information on a particular topic should be structured, it does provide a syntax for writing such specifications, called XML Schemas or DTDs (document type definitions). If I have a music-composing program which can output XML conforming to a particular schema, and you have a music-printing program which can read XML conforming to the same schema, then we have understanding about the way the information is structured. And because XML is an open standard which isn't owned by anybody, there's less incentive for people to feel proprietary about the way that their data is stored in it. So, slowly but surely, people are starting to agree on schemas for storing and transmitting information on particular topics. This should mean that in future, your customers' names are more likely to be stored in a similar way to mine, and transferring information between us should be easier.

In summary, here's an analogy. Imagine that you have to organise a large meeting for lots of people from different companies and different cultures. They might all disagree on which items should be on the agenda. They might also disagree on how such meetings should be organised. But you are never going to make any progress on either of these if the participants all speak different languages. Once you agree on a common language, you can concentrate on higher-level things.

That's the problem XML is addressing, and that's why it's so important.