Professor von Clueless in the Blunder Dome |
status privacy contact |
|
Hangout for experimental confirmation and demonstration of software, computing, and networking. The exercises don't always work out. The professor is a bumbler and the laboratory assistant is a skanky dufus.
Atom Feed Associated Blogs Recent Items Archives |
Monday, October 17, 2005Magical Thinking and the Universal Document Elixir
As long as we’re sitting here by the campfire telling ghost stories about the great OpenDocument vs. Microsoft Office XML FUDwrestle, it is appropriate to discuss the really great idea that the OpenDocument format is designed to be a universal file format such that, according to one commenter “all the information in any file format should be able to be stored in ODF without loss.” It is appealing to then conclude that “this would allow it to be use[d] as the native format in many applications and, most importantly, a universal translation method between any two different formats.” What a wonderful straw man! What a beautiful dream. A universal format that serves as a universal document model that all formats can be translated through. Douglas Englebart will be very happy to know that this knotty problem is solved and he can get on with the OHS and other projects dear to his heart. Microsoft: Damned If You Do, Damned If You Don’tWhat’s really great about this is how it makes such a cool mouse trap to use on the folks at Microsoft. Here’s where another comment took it:
Let me see, we’re supposed to assume that the magical binary key must exist because if it didn’t, there would be transformation filters between MSXML (I am not sure which XML that is, but let’s suppose its the existing WordML just for clarity) and ODF. But there aren’t so the magical invisible binary key must exist? Well, maybe there aren’t because “should be able to” is actually a really hard problem? There are two difficulties here. One problem is that the commentator is quoting Gary Edwards again and I’d really like to hear from someone else who can speak authoritatively about OpenDocument. I’d like some sense for who else is drinking the same cool-aid and most-of-all who is willing to provide some technical evidence for all of these weird claims. The other thing, and that is what I really want to talk about, is the presumption of universal translatability, if that is what is really meant (e.g., easy conversions over and back). Is There a Universal Document Format?I have my doubts whether a universal document format is even possible. I am willing to consider that some practical level of this might be accomplished for a selected set of cases and document models that can be conformed somehow. We’ve barely gotten to that level with programming languages (thanks to the .NET CLI, actually) after a quest of almost 50 years, and programming languages are easier (unless a human has to understand the result, and then it might be harder). So what I’m looking for is not some vague claim of a dream fulfilled but a simple demonstration of how and what level of universal transformation layer has actually been accomplished. What is the model and what was concluded about the conditions under which inter-translation works? What are/were the metrics? How’d This Become the Terms of Debate?The basis for this claim is that interview of Edwards (sorry) where he is reported to have said
The interview continues to reaffirmation of the universal transformation layer with
In the cited examples of publishing and content-management systems, nothing from Microsoft is mentioned. I also don’t see mention of TeX, PDF, DocBook (or SGML generally) or a contemporaneous ISO specification, the Open Document Architecture (ODA). Since these last are well- and fully-specified, I would think they’d make great tests for successful universal transformation. What You See Is All You GetBeside Doug Alberg of Boeing, Edwards also gives great credit to “legendary Daniel Vogelheim” (co-architect of the OpenOffice.org XML file format and a Sun Software Engineer) for this period of the work. Vogelheim is more conservative in his stance, according to Eric van der Vlist writing in <?xmlhack?>. It seems that Vogelheim takes “transformability” to mean that the format is usable outside of the office application, something which should be pretty-much true of any XML format for a document and the point of examples that Brian Jones posts about integrating/blending WordML and Excel XML formats with business applications. The full abstract for Vogelheim’s XML2002 talk expands on this notion. It is clear that extraction and repurposing is intended. Nowhere is there any claim for universal transformation between document formats, something Edwards appears to mean and that everyone else picks up on. This also appears to be the basis for whatever logic has people believe that all Microsoft has to do is adopt the OpenDocument format. I’m willing to believe that Edwards is serious about this when he makes comments on Bob Sutor’s blog like, “The magic transformation qualities of ODF on the other hand are legendary, and it's only five years old!” I just can’t see anywhere that has been handled. Show Me the ElixirHere is where I end up with this. If there were indeed a charge to ensure some degree of universal translation with ODF as an intermediary, there is no evidence of it in the OASIS Specification. I did a search through the PDF for every occurrence of “transformation” in the document. The greatest number of occurrences have to do with transformation as used in presentation systems (such as Adobe Postscript) for transformations of drawing geometries. There are a few cases where design and feature changes are described in terms of making transformation of documents via XSLT a little easier. The key example, to my mind, is the design goal of having it be possible for any elements below the paragraph to be ignored (that is, the tags are dropped) and the remaining content be appropriate for text extraction. This is nowhere like preserving formatting and document models and whatever else as part of a translation with ODF as a document lingua franca. [It also appears to capture hidden text.] Most of these features are described in terms of how they should make such transformation easier. None of them seem to be about preserving the document in going from/to ODF. I also see this principle as a barrier to the successful translation of non-ODF document architectures to ODF, when that architecture depends on sub-paragraph elements with content that is not intended to be part of the text content at all. (Whether or not that was a good idea, the question is how does one get into ODF with it.) Now if translation were part of the charter and charge of the Open Document Technical Committee (if you can find it let me know), and some kind of universal document model were achieved, I would expect that
I find nothing like that. Anywhere. Comments: The universal doc comment was amusing. ODF can't even transform the Office XML formats w/o loss of information which is probably the primary value of the Office XML formats (being able to move legacy binary formats to fully documented XML formats w/o data loss). I'm still waiting to see what hoops MA is going to jump through to get complex legacy documents into ODF without loss of data and/or time and money that could've been avoided if they allowed Office XML. I see another Munich Linux migration-styled mishap. I do think MS should consider formalizing the Office XML and XPS Reach formats with ISO after they finalize the specs. I personally have no issue with the current licensing, but going to ISO would allow them to have a base standard everyone could use as a common denominator for interchange while allowing them to continue making further enhancements to their formats for future releases, much like they're doing currently with .NET and the CLI, C#, C++/CLI standards and Adobe w/ PDF. Maybe this has already been considered though, and they are waiting for final specs before submission. |
You are navigating the Blunder Dome |
template created 2004-06-17-20:01 -0700 (pdt)
by orcmid |