Illegally embedding attributes for fun and profit

Imagine a web interface where you drag items into clusters. Dropping an item onto a cluster should make a request that persists the association. To do this, two ids are required - one for the item and one for the cluster. The most natural way to list these attributes would be on the HTML elements themselves. Maybe something like

<div class="item" item-id="478">idea</div>
<div class="cluster" cluster-id="112">group name</div>

(Let’s assume that we can’t simply use the id attribute - id collision or whatever). Unfortunately, the HTML above isn’t valid. Neither item-id nor cluster-id are recognized HTML attributes and the spec makes no allowance for adding arbitrary attributes. This is a real shame as any alternatives involve more markup, scripting or both.

Is knowing generation of invalid HTML such a bad thing? Well the W3C validator states that

Validity is one of the quality criteria for a Web page, but there are many others. In other words, a valid Web page is not necessarily a good web page, but an invalid Web page has little chance of being a good web page.

Yikes! Sounds quite imperious. So how does this little axiom stand up in the wild?

  • http://ajaxian.com/ is XHTML 1.0 Transitional with 682 Errors, 16 warning(s)
  • http://www.google.com/ is HTML 4.01 transitional with 66 errors, 9 warning(s)
  • http://www.facebook.com/index.php is XHTML 1.0 strict with 69 errors, 27 warning(s)
  • http://groups.google.com/group/jquery-en is XHTML 1.0 Transitional with 752 Errors, 122 warning(s)
  • http://developer.mozilla.org/ is XHTML 1.0 Transitional with 43 Errors, 36 warning(s)
  • http://validator.w3.org/ was successfully checked as XHTML 1.0 Strict

Ok, so I no longer care about validation. Read that advisably - I have no intention of producing tag soup and I’m fully aware that many of the errors above were caused by malformatted urls, but I’m not going to worry if my page fails validation.

What happens when the page is parsed and invalid attributes are encountered? Well, the spec merely makes a recommendation

If a user agent encounters an attribute it does not recognize, it should ignore the entire attribute specification (i.e., the attribute and its value).

In practice, however, browsers do not do this. A webkit blog post tells us that

Many technically illegal constructs, like misnested tags or bad attribute names, are allowed or safely ignored. This error-handling is relatively consistent between browsers.

This bodes well, but what of the future? Will HTML 5 outlaw or support arbitrary attributes? The news is good, embedded attributes are explicitly supported by HTML 5. They take the form data-*="". Let’s rewrite the attributes above so we’re at least consistent with HTML 5.

<div class="item" data-item-id="478">idea</div>
<div class="cluster" data-cluster-id="112">group name</div>

So, I’m explicity generating invalid HTML 4 and, DOCTYPE excepted, valid HTML 5. But the real question - does it work? I haven’t tested in IE6, because as we all know, it’s teh suck, but in other browsers all is well.

A test page consisting of 26 elements, each with two attributes, is consistently aggregated in less than 1ms by Safari. Firefox takes between 8-12ms and IE7 is a little slower, usually about 50ms.

The end result is simple markup that can be easily inspected with a minimum of fuss with a library like jQuery. For example, to establish the relationship between item and cluster, we’d merely write something like the following.

$.post("/relationship", {
    item_id : $draggedItem.attr("data-item-id"),
    cluster_id : $droppable.attr("data-cluster-id")
  }
);

Update: Scott Byers had problems with the selector syntax above - jQuery merely returned an empty set. He got around it by escaping the - character e.g. $("[data\-foo='bar']").

This entry was posted on Thu, 04 Sep 2008 08:55:00 GMT and Posted in . You can follow any any response to this entry through the Atom feed. .