Orcmid's Lair

Welcome to Orcmid's Lair, the playground for family connections, pastimes, and scholarly vocation -- the collected professional and recreational work of Dennis E. Hamilton

This page is powered by Blogger. Isn't yours?

2004-06-02

 

If It Says Libbys Libbys Libbys on the Label Label Label ... That's Not an Identifier!

.  I uttered that expression in a giddy moment in the early 1970's when we were attempting to introduce identifiers for some purpose into some then-called middleware software.  I can no longer remember why we cared and whether I thought there to be some profound truth captured in my exclamation.

And the business of fretting about identifiers lives on.  A number of folk are working over (or under) Mark Pilgrim's recent "How to make a good ID in Atom."  I have my two-bits worth.

There's this thing about identifiers.  Like, exactly what is an identifier?  On the web it usually means a globally-unique identification for some entity, such as a resource or an element or a blog entry or something else entirely, very large or very small, tangible or not, electronic or not.  It is often affixed to or transported with an entity -- a data unit of some sort -- and can often be used to access it, if known.  It might be regarded as metadata.  Or not.  The entity that is distinguished by an identifier might be immutable or not (as when a web page, uniquelly identified, changes).  Does the identity remain fixed if the entity moves around, or are different copies each uniquelly identified for their different instances?  Are there multiple kinds of identity?  Inquiring minds want to know.  At some point, we all begin to squirm, throw up our hands, thrust a stake into the ground, and declare exactly what it is we are identifying and how.  And nothing else.  Period.  Go somewhere else for that philosophical crap.  Take it to the Semantic Web.  Please.  It doesn't take much devil's advocacy to overload and topple over any proposed identification scheme.

The still-sane among us proceed to introduce rules for identifiers and vow to keep it simple.  Deeper problems can be solved later.  Maybe.

By globally-unique is meant that the same identification will never be used by mistake for distinct entities.  The reverse problem of having more than one globally-unique identification for the same entity is considered not so weighty, mostly to avoid backsliding into the tarpit of "what do we mean by the same entity?"

The avoidance of identification collisions is often accomplished by using some centralized issuance mechanism like that for BookCrossing identifiers or International Standard Book Numbers, or international telephone numbers (a hierarchical but, for you and I, centralized authoritarian system).

In cyberspace there is a preference for decentralized mechanisms that allow people to fabricate unique identifications to their heart's content without having to rely on any issuing authority quite so much.  One way is to start with a single globally-unique identifier -- the seed element -- that one already happens to have some exclusive authority for using.  Adopt that as the basis for fabricating more identifications by concatenating with additional qualifiers of some sort that you will never duplicate.  Use a list of already-used elements or a counter or a clock or some other reliable technique.  You now have a family of globally-unique identifiers that will not collide with others produced using the same principle but different seed element.

A key requirement of decentralized schemes is agreement on the scheme and agreeing on the seed or a way to identify which one of several schemes might be used.  That, of course, is what all the fuss is about.

Mark Pilgrim addresses the creation of Atom IDs for Atom elements.  It's a nice read.  The IDs must be unchanging, globally unique URIs.  URI (Uniform Resource Identifier) schemes are governed by IETF specifications and a variety of URI flavors can be used to ensure global uniqueness using techniques of the kind I mentioned above.  Mark provides links to the relevant specifications.

The Atom specification allows any URI that does the job.  This happens in many W3C and IETF specifications where globally-unique identification is required, as in identifying XML Namespaces.  Mark goes on to address good choices for ensuring the "unchanging" aspect and also avoiding using a URI that might be used for a different and possibly changing purpose (such as the permalink of a blog entry).

Finally, Mark mentions one URI scheme that one can use as-is provided that you have something like a domain name or email address that you are willing to use as the seed. Mark also warns against URN (Uniform Resource Name) schemes because of registration difficulties that don't provide easy decentralization.

It's all clear enough.  You can (and might) start using the tag URI scheme at once, if you are that trusting in its not-yet-official adoption.

Tim Bray chimes in that he is not that trusting of the unapproved and possibly-never-to-be-approved tag-URI scheme that Mark favors.  Tim suggests an alternative already-open-ended and already-registered URN scheme that might serve as well.  That's all fine, there is room for this kind of flexibilty in choosing an unchanging, globally-unique URI scheme, and we don't all have to choose the same one because they can be comingled without collision.

Where Tim takes greater exception is with Mark's injunction to avoid using the permalink -- the permanent location, a URL (Uniform Resource Locator) -- as the unchanging, globally-unique URI.  I accept Mark's injunction on first principles.  I want an identifier to be just the identifier it is intended to be and have no other job.

Tim's proposal is unnecessarily brittle unless what one does is use the original permalink as the unchanging, globally-unique URI and never attempts to derive the Atom ID at future times based on (current) permalink URL.  One might handle this on an exception basis, but the requirement is clear: Either arrange to never change a permalink or arrange to allow the permalink and Atom ID to be different at some future time.

Tim's observations are useful and workable.  Tim makes great arguments for never (as in never say never) changing permalinks.  I bend over backwards to do that on web sites: If I've given out a URL, I don't know who may have bookmarked it and I want to preserve incoming links.  (That's why this blog uses *.asp permalinks, even though there is no server-side ASP processing at the moment. It's a cheap way to get redirects if I ever need to make one.)  I also want permalink impermanence to be a different problem not overloaded on the Atom ID (or other ID).  Now, we are still working within the Atom requirement for an unchanging, globally-unique URI, and I can have my druthers, and Tim can have his. (Of course, I also want my web sites to work when replicated on CD-ROMs and that means the permalinks are best generated from relative URLs and there, well, the brittleness bites.)

There is another architectural principle or two to add here.  The most important one is that one should not create a brittle arrangement that is too subtle and easy for future developers to forget or never notice.  There's a secondary and useful architectural-durability principle that I was shown by Dick Wilson as long ago (early 1970s) in a different context: Choose the approach that is the easiest to change in the future.

Danny Ayres notices Mark's Comments and Tim's Response.  It is recognized what is at issue, and something in a trackback from Chris Dent has me wonder about context.  What about context here?  It is certainly desirable that permalinks be permanent, so is the scope of that desire the same as for the Atom ID?  I think it depends on the use case, and mine about wanting mirrors on CD-ROMs that work means the permalink isn't that permanent or else isn't all that unique as an usable link.  Whatever.

Of course, now purple numbers come into the conversation, and I am not going to go down that road here, unless someone tells me that Atom proposes to satisfy the requirements for Englebart's Open Hypertext System (OHS).

Meanwhile, as Danny points out, Bill de hÓra strikes a new balance.  There's nothing very permanent about URLs based on domain names, and so we need to rethink the Atom ID URI candidates. 

It seems that the semi-permalink needs to be a URL simply because it is supposed to be a way to locate the entry.  It now looks to me that a UUID-based URI is more valuable as an identifier of the Atom ID sort.  Those come from the permanent ID in my network interface card, and I can arrange to destroy that puppy if I ever stop using it.  Whether or not I do that, the UUIDs that I generate while I have the card are ones that will never be generated again.

And this may still be beyond context for Atom IDs.
Listening to:
Teddy Wilson, Benny Carter, Red Norvo, Louis Bellson, Remo Palmier, George Duvivier, Freddie Green Swing Reunion, Disc No. 1.  Digitally-recorded CD, Book-of-the-Month Records (1985).  Listening on Media Player 9

Comments: Post a Comment
Hard Hat Area

an nfoCentrale.net site

created 2002-10-28-07:25 -0800 (pst) by orcmid
$$Author: Orcmid $
$$Date: 04-11-25 22:45 $
$$Revision: 2 $

Home