| Subject: | Eric Vought Contribution on Behalf of The Misty Manor, Mercers Regarding ISO/IEC DIS 29500 (ECMA-376) |
From
Sent
To
Subject
It has come
to my attention that ANSI abstained from participating in the 30 day comment
period for the ISO/IEC JTC 1 Fast Track process for the ECMA-376 standard
"Office Open XML File Formats" due to an inability to reach consensus. As
a business owner, former IT professional, former member of the Austin Group
technical committee, and interested party in the recent efforts to standardize
the preparation and storage of electronic government documents, I am writing
to express my deep concerns with the proposed standard.
First of all, as is echoed by a number of ISO representatives
and IT professionals, ECMA-376, a specification derived from Microsoft Office
2007's Office XML format, obviously conflicts with and duplicates the scope
and purpose of ISO 26300 (ODF)
ECMA also states that ECMA-376 and ISO 26300 serve
different markets, specifically that while ISO 26300 was designed from the
ground up to serve existing and future document needs in a sensible and
standards conformant manner, ECMA-376 was also designed to serve the
needs of storing legacy documents which were stored in a number of existing
binary formats. It may be argued from examination of the ECMA-376 specification
that the data model of these legacy formats was the driving design factor.
While this does, indeed, distinguish the purpose
of the two formats, the latter is of dubious value as an international standard.
ECMA-376's distinguishing feature becomes that it takes a number of obscure,
complex, and opaque binary file formats and converts them to a single monstrously
complex (over 6,000 pages) obscure, complex, and opaque text format which
contains numerous references to behavior in legacy applications which is
never described. Rather than reference existing ISO standards for, e.g.
times and dates, inline graphics, percentages, colors, citations, etc.,
ECMA-376 defines its own legacy encodings, in some cases, multiple conflicting
encodings for similar data. It is difficult to see why anyone would want
to recommend this format for new documents and it makes little sense to
create a standard which is deprecated from its inception.
There are many examples of strange format choices
within the ECMA-376 document structure. There are two calendars for expressing
dates. One is based on a Gregorian calendar with a 1904 epoch, the other
has a 1900 epoch with an (incorrect) assumption of a 1900 leap year. Rather
than placing the burden of normalizing date stamps on a conversion program,
the format continues to propagate a bug from Lotus 1-2-3 date handling.
Percentages are expressed inconsistently and sometimes bizarrely. In some
instances they are bare integers, such as "71" (in contrast to HTML "71%").
In some places they are expressed as integer fiftieths of a percent, so
that, for instance, "200" represents "4%". In another place, they are represented
as discrete constants, such that "pct87" represents "87.5%" [not a typo].
This is not a single technical detail nor a series of technical details,
but rather an overall design choice for ECMA-376 to remain as close to the
legacy data models as possible, thus requiring applications to retain those
data models in perpetuity. ISO-26300, by contrast, puts the burden on conversion
programs to retain knowledge of the legacy structures and normalize the
data when producing a conformant document.
This design
is most clearly expressed in ECMA-376 by tags like "autoSpaceLikeWord95"
which requires the application to duplicate the unspecified behavior of
a legacy application.
When faced with converting a legacy document, an
application has several choices with regards to some of these obscure features
and idiosyncrasies
1) Convert the document 1
2) Attempt to convert the legacy schema, with or
without human intervention, to adhere to modern conventions, such as converting
broken dates and reinterpreting legacy layout options in terms of modern
features. Here again, it is difficult to see what benefit ECMA-376 adds
as ISO-26300 was specifically designed for this case.
3) Convert to ISO-26300 1
The ECMA-376 format is of clear value to Microsoft
in supporting its legacy products and existing applications suite. By documenting
this file format, Microsoft allows others to find value in the format as
they will and to increase interoperability with Office 2007. Microsoft certainly
deserves the community's thanks for this action. However, ECMA-376 is large,
overly complex, does not promote general interoperability, and is of dubious
value to the international standards community. Having two similar and competing
standards will confuse the marketplace and cause balkanization of document
storage. The lack of standardization within ECMA-376's internal structures
duplicates the work of existing, mature standards, and locks application
developers into perpetual support of legacy, non-standard data schema.
I hope that ANSI will pay due attention to this
standard as the JTC-1 process progresses.