Is DOCX really an open standard?

by Abhishek Bhatnagar

It is hard to believe that even in 2012 we struggle with standards as common as those of documents, presentations and spreadsheets. The de facto formats of these of course are those used by Microsoft Office (docx, pptx, xlsx (collectively called OpenXML or OOXML)), which causes a growing number of Libre and Open Office users such as myself much chagrin.

Like everyone else, the majority of office files I receive in my inbox belong to one of the OOXML category, and invariably as I edit and return the document to the owner, they complain that I have in some way corrupted or changed the elements within because of my choice of software, which is usually true. Then they berate me for being using “crappy” open source software and in one case, for being an “anti-Microsoft hippy”.

Let’s be clear, I am not an anti-Microsoft hippy. Like many of you, I run Linux and under a normal scenario, do not have access to Windows, so running MS Office is really not an option. Even if it were, I would detest having to pay for it. So for the simple reason of including myself and the millions of others who use the various open office suites out there, I request that you stop using OOXML formats, at least until Microsoft truly supports them in MS Office.

I’ve been angrily told before that OOXML or OpenXML is indeed an Open Format, which is technically correct. But there’s more to the story than that. If there weren’t, Libre and Open Office would have built perfect support for it a long time ago. They realize that not fully supporting Microsoft formats is one of the key repellers to new users for their base, so they would not not implement OOXML by choice.

The real reason that these software do not fully support OOXML is because there is a difference between the OOXML specification, and OOXML implementation in MS Office. To understand why, you have to familiarize yourself with three standards:

  • ECMA 376
  • ISO/IEC 29500 Transitional
  • ISO/IEC 29500 Strict

ECMA is a private international standards organization much like the better known ISO. The difference between the two is that ECMA is made out of companies, while ISO is made out of countries. There is certainly a need for both them in the technology market.

ISO along with another consortium called OASIS adopted the ODF (Open Document Format) back in 2006 to solve the document standardization crisis. This is the format that is used by Libre and Open office, along with most other open office suites. Such a format becoming successful would of course threaten Microsoft's already established monopoly in the Document market, which at the time ran on closed formats such as doc, ppt, and xls. So in 2007, they decided to create their own open standard with ECMA called OpenXML or OOXML, otherwise known as ECMA-376. This was the new “XML based” replacement for ODF, which of course seemed unnecessary to ISO and was initially rejected. But with the use of some muscle, Microsoft got the proposal fast-tracked in ISO even though reportedly 20 out of the 30 countries involved were not interested in passing it. This however didn’t stop the ISO secretariat Lisa Rachjel from pushing it through anyway after deciding “to move Open XML forward after consulting with staff at the International Technology Task Force”.

So ISO had a new incoming standard, but specific clauses of it still met resistance. To solve this problem, it was proposed that OOXML be split into two sub-standards, namely ISO 29500 Transitional, and ISO 29500 Strict. The Strict version was that which was accepted by ISO, and the Transitional version was fairly granted to Microsoft to allow them to slowly curb out older features from the closed source days. Nothing wrong with this, its only fair to their users.

However, the problem arose when Microsoft decided not to fully implement the Strict version of the standard in Office 2010. As published my Microsoft here and stated by Wikipedia here:

Microsoft Office 2010 provides read support for ECMA-376, read/write support for ISO/IEC 29500 Transitional, and read support for ISO/IEC 29500 Strict.

What this means is that when you save a document in MS Office 2010 or prior in any of the ‘X’ formats, you are not saving them in the advertised OpenXML format. This document will hence NOT be properly readable by other software such as Libre and Open Office and they will make changes to the document when they are opened and saved within them. The problem hence lies with the former, not the latter.

But, to be fair, we should note that we have been promised full ODF support in the upcoming Office 15. Alex Brown has an excellent post on this subject with more details about the gap between the promises Microsoft made in 2008 to what they actually delivered in 2010. Hopefully they won’t follow suit and actually keep their promises this time. I am actually genuinely excited to find out.

Lately there has been a shift towards the usage of PDF, especially when it comes to documents that do not need to be edited such as resumes, essays, and reports. The reason for the change of course is an organic realization that PDF is a no-bullshit format that works consistently and predictably across all platforms. While PDF is not exactly an open format, Adobe does provide free and consistent specifications for all to implement it as they please. If you are an MS Office user and also have been a part of the great PDF shift, you too have something to gain from the true open implementation of OOXML.

I would still prefer to see ODF win the battle, but if this happens, then at least their will be much fewer reasons to complain. Plus, Libre Office developers won’t be jerked around as much in trying to play catch up to an always moving target.

Anyway in the meanwhile, please save your documents in ODF when you use Microsoft Office.