[html4all] HTML Specifications

Robert J Burns rob at robburns.com
Sun Feb 1 12:30:46 PST 2009


Hello 4All,

I've made some substantial progress on developing an HTML4All HTML  
specification/specifications. I've done a lot to re-factor the way  
HTML is framed: mostly to avoid the common confusion of technical  
terms (such as an HTML document which can mean an HTML document  
regardless of its serialized form or can mean a non-XHTML document),  
but I think there are many other benefits too. Based on earlier  
messages I sent to the HTML WG, I followed my own advice and separated  
the parsing from the vocabulary from the browser behavior. My thinking  
is that, other than parsing and the presentation of HTML vocabulary, I  
will mostly rely on referencing the HTML5 specification in terms of  
browser behavior. For the presentation of HTML vocabulary, I expect to  
describe that mostly in CSS terms (though there are some things that  
have no CSS analog) so any CSS conforming UA will be able to easily  
support the presentation of our HTML (which I'm modestly calling HTML  
4.1 for lack of another name).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: HTML4AllStack.png
Type: image/png
Size: 39152 bytes
Desc: not available
URL: <http://wilbur.bytowninternet.com/pipermail/list_html4all.org/attachments/20090201/1e49c60d/attachment-0001.png>
-------------- next part --------------



Parsing:
The parsing adds many forward compatible features that Ian has  
rejected under the claim that no browser is going to make any changes  
to the parsing (so apparently we specify the incorrect parsing because  
we take the fatalist position that "oh well, no one will implement the  
HTML5 parsing algorithm anyway"). Some of the changes I added to the  
parsing algorithm are already supported in one browser or another. For  
example I added the WebKit behavior of allowing a self-closing tag on  
'script' elements with a 'src' attribute. I've also added support for  
new and unknown elements in the head which some browsers support  
(e.g., I think Opera off of the top of my head).

I added namespace aware parsing to the parsing algorithm so that not  
only are 'html', 'mathml', and presumably 'svg' to be added to their  
respective namespaces, but any author declared namespace will also be  
added to the appropriate namespace. This namespace aware parsing is  
not all that different from IEs text/html parser (though IE admittedly  
is a bit less namespace aware in its resulting document).

I'm also in the process of explicitly adding the HTML4All elements to  
the parsing algorithm, though a browser implementing the parsing  
algorithm will automatically work with our newly added elements (which  
is the forward compatibility feature I added already).

Serialization:
In the spirit of separating implementation conformance from document  
conformance, the parsing algorithm is entirely about implementation  
conformance. Serialization on the other hand is entirely about  
document conformance (well except for serializing  
implementations :-) ). My goal is to have a canonical HTML  
serialization (what I'm calling cHTML with c for canonical) that  
basically follows the XHTML1.0 appendix C criteria and the new W3C  
Media Types note[1] with respect to HTML (and obviously not the  
script, DOM, and CSS criteria in that note). This serialization  
promotes what many of us insist are best practices in serialization.

Leif has raised with me the desire to allow some source minimization  
and I think this could be done with alternate serialization  
specifications (which would have some corresponding conformance  
checking service). The possible alterations a serialization separate  
from the cHTML serialization might add (in order of best practice  
where each item in the list is a little worse practice IMHO).

  ? element tag minimization where some closing tags can be omitted  
(except for 'p') and opening and closing tags can be omitted on  
'html', 'head', 'body', and 'tbody' and 'colgroup' (I expect to  
require explicit 'tbody' and 'colgroup' in the cHMTL serialization)
  ? omission of the self-closing tag solidus "/" from elements defined  
as empty
  ? boolean attribute minimization (e.g., "<object data='url' declare  
 >...</object>" instead of "<object data='url' declare='declare' >...</ 
object>")
  ? omission of quotation marks from attribute values in certain  
circumstances
  ? 'p' element closing tag omission. I expect to require this in the  
cHTML serialization for forward compatibility reasons. It was clearly  
a mistake to ever allow 'p' close tag omission and requiring going  
forward means that there are no strange exceptional cases with new  
elements such as the 'section' element which will work in many  
browsers automatically (and with the tHTML parsing algorithm) in a  
forward compatible way, but will not implicitly close the 'p' element  
as it should.

The last two items are particularly troublesome for various reasons I  
won't go into now, but I think if we do define an alternate  
serialization to the cHTML serialization it should not include the  
last two items in the list (though some authors "in the know" will no  
how to produce even more minimized syntax that still parses  
correctly). In any even such a serialization will be compatible with  
the tHTML parser, but not with an XML parser, so authors can decide to  
maintain code only for tHTML and SGML parsing or for XML, SGML, and  
tHTML parsing with the cHTML serialization.

Vocabulary:
This is really the meat of the proposal. This is my attempt to do what  
I thought Ian should have been doing all along: listening to the  
members of the WG, engaging in dialog with them on and off list, and  
weaving their best suggestions into a new HTML vocabulary  
specification of which we could all be proud. There is a lot of new  
ideas in this vocabulary and therefore there is a lot to absorb in it.  
However, I think the features will prove quite intuitive and simple  
for authors to use and many things that today require complex  
scripting can be done in HTML 4.1 simply and with HTML-style  
declarative markup and a conforming browser.

The vocabulary specification is currently a combination of document  
and implementation conformance criteria. It might be possible for us  
to split this out later if we wanted to, but I think in the meantime  
it is good to keep those norms together for editing purposes.

I have already elaborated most of the new HTML4all originated features  
have

Strategy:
Some have asked me how we can hope to influence change in this area  
without the support of the W3C. Ideally will might gain their support  
and perhaps be invited into a genuine public process to develop HTML5.  
However, I sincerely believe we can bring these changes about even  
without the support of the W3C. Here's the steps I have in mind:
? to publish an HTML recommendation (recommendations) that put(s) the  
needs of users and authors first
? to provide various machine readable schema (XSD, RelaxNG, DTD) in  
addition to the normative prose of the recommendation to support  
alternate UAs
    - also provide an online validation/conformance service so authors  
can check their documents? conformance to the specification
? to organize and support the implementation of this recommendation in  
at least one open source rendering engine
? to organize the tracking of feature requests to bring support for  
our HTML recommendation to all the major browsers
    - with one reference implementation in place, there should be  
significant pressure on other browsers to support these desirable  
features
? develop javascript implementations of these features so that support  
can be added even to non-conforming implementations (at lest the ones  
supporting javascript).
? create tutorials to show authors how to use these new features  
(e.g., monikers and XForms on anyElement features)

Because of a vibrant open source rendering engine community (KHTML,  
WebKit, Mozilla) and several open source browsers (e.g., Shira), I  
think we can lead the way both to better rendering engines and to  
better browsers. That is not to say that the commercial vendors will  
follow suit. However, much of what we propose degrades gracefully in  
other browsers.

Wiki editing approach:
I welcome everyone to get involved with editing these specifications.  
I have not done the work of specifying many of the existing features  
drawn from HTML4. Leif and I have interacted somewhat bout this  
privately and I'm convinced that a wiki approach would be ideal to  
develop an HTML vocabulary specification that was as clear and concise  
as could be. Ian's criticisms of HTML4 prose are sometimes justified,  
though HTML5 is often worse. However I think starting form the HTML4  
descriptions of the semantics of the elements and attributes and  
improving upon them in a wiki way will create an excellent  
specification.

Also on the new HTML 4.1 features, there's always room to improve upon  
my prose. Having new eyes read those and point out what is not clear  
and what is redundant would be quite helpful. Also some new features  
are described on the original page for this project[4], but have not  
been re-created on the new draft page (such as the 'access' element  
and the 'marks' attribute).

Currently all of the various chapters (and modules) of the  
specification are gathered together on one wiki page[2]. That page has  
a corresponding discussion or talk page where deliberation can take  
place about the language of the specification (and even the  
substantive features). Normally on wikipedia there is a "no original  
research" policy. However, in contrast this effort is original  
research so that does not apply here. Instead I think we simply should  
foster a collegial atmosphere where we work to build consensus and  
focus on good faith improvements to HTML. Though I welcome us to think  
about new and better ways to provide accessibility in documents, I do  
not anticipate that many of the controversies we've faced in the HTML  
WG regarding features such as summaries for tables will be a problem  
here at HTML4All.

As for the final presentation, order and hierarchy of the prose, I  
feel we might change that before it reaches its final form (and not  
presented on a wiki. However in the meantime I think the wiki approach  
will be an excellent way to shape the prose of this specification.

The parsing algorithm couldn't go on the wiki since mediawiki is  
rather mediaweak when it comes to support for mildly complex  
hierarchy. There is a wiki page about the algorithm[3] and the  
corresponding talk page would be a fine place to discuss that  
algorithm and changes to it. The algorithm is also a quite complex  
read with many interlocking dependencies so filtering all the edits  
for that through one editor is not such a bad idea anyway.

Take care,
Rob

[1]: <http://www.w3.org/TR/xhtml-media-types/>
[2]: <http://html4all.org/wiki/index.php/HTML_Draft>
[3]: <http://html4all.org/wiki/index.php/Parsing_tHTML>
[4]: <http://html4all.org/wiki/index.php/HTML>


More information about the List_HTML4all.org mailing list