[html4all] HTML Specifications
Robert J Burns
rob at robburns.com
Sun Feb 1 12:30:46 PST 2009
Hello 4All,
I've made some substantial progress on developing an HTML4All HTML
specification/specifications. I've done a lot to re-factor the way
HTML is framed: mostly to avoid the common confusion of technical
terms (such as an HTML document which can mean an HTML document
regardless of its serialized form or can mean a non-XHTML document),
but I think there are many other benefits too. Based on earlier
messages I sent to the HTML WG, I followed my own advice and separated
the parsing from the vocabulary from the browser behavior. My thinking
is that, other than parsing and the presentation of HTML vocabulary, I
will mostly rely on referencing the HTML5 specification in terms of
browser behavior. For the presentation of HTML vocabulary, I expect to
describe that mostly in CSS terms (though there are some things that
have no CSS analog) so any CSS conforming UA will be able to easily
support the presentation of our HTML (which I'm modestly calling HTML
4.1 for lack of another name).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HTML4AllStack.png
Type: image/png
Size: 39152 bytes
Desc: not available
URL: <http://wilbur.bytowninternet.com/pipermail/list_html4all.org/attachments/20090201/1e49c60d/attachment-0001.png>
-------------- next part --------------
Parsing:
The parsing adds many forward compatible features that Ian has
rejected under the claim that no browser is going to make any changes
to the parsing (so apparently we specify the incorrect parsing because
we take the fatalist position that "oh well, no one will implement the
HTML5 parsing algorithm anyway"). Some of the changes I added to the
parsing algorithm are already supported in one browser or another. For
example I added the WebKit behavior of allowing a self-closing tag on
'script' elements with a 'src' attribute. I've also added support for
new and unknown elements in the head which some browsers support
(e.g., I think Opera off of the top of my head).
I added namespace aware parsing to the parsing algorithm so that not
only are 'html', 'mathml', and presumably 'svg' to be added to their
respective namespaces, but any author declared namespace will also be
added to the appropriate namespace. This namespace aware parsing is
not all that different from IEs text/html parser (though IE admittedly
is a bit less namespace aware in its resulting document).
I'm also in the process of explicitly adding the HTML4All elements to
the parsing algorithm, though a browser implementing the parsing
algorithm will automatically work with our newly added elements (which
is the forward compatibility feature I added already).
Serialization:
In the spirit of separating implementation conformance from document
conformance, the parsing algorithm is entirely about implementation
conformance. Serialization on the other hand is entirely about
document conformance (well except for serializing
implementations :-) ). My goal is to have a canonical HTML
serialization (what I'm calling cHTML with c for canonical) that
basically follows the XHTML1.0 appendix C criteria and the new W3C
Media Types note[1] with respect to HTML (and obviously not the
script, DOM, and CSS criteria in that note). This serialization
promotes what many of us insist are best practices in serialization.
Leif has raised with me the desire to allow some source minimization
and I think this could be done with alternate serialization
specifications (which would have some corresponding conformance
checking service). The possible alterations a serialization separate
from the cHTML serialization might add (in order of best practice
where each item in the list is a little worse practice IMHO).
? element tag minimization where some closing tags can be omitted
(except for 'p') and opening and closing tags can be omitted on
'html', 'head', 'body', and 'tbody' and 'colgroup' (I expect to
require explicit 'tbody' and 'colgroup' in the cHMTL serialization)
? omission of the self-closing tag solidus "/" from elements defined
as empty
? boolean attribute minimization (e.g., "<object data='url' declare
>...</object>" instead of "<object data='url' declare='declare' >...</
object>")
? omission of quotation marks from attribute values in certain
circumstances
? 'p' element closing tag omission. I expect to require this in the
cHTML serialization for forward compatibility reasons. It was clearly
a mistake to ever allow 'p' close tag omission and requiring going
forward means that there are no strange exceptional cases with new
elements such as the 'section' element which will work in many
browsers automatically (and with the tHTML parsing algorithm) in a
forward compatible way, but will not implicitly close the 'p' element
as it should.
The last two items are particularly troublesome for various reasons I
won't go into now, but I think if we do define an alternate
serialization to the cHTML serialization it should not include the
last two items in the list (though some authors "in the know" will no
how to produce even more minimized syntax that still parses
correctly). In any even such a serialization will be compatible with
the tHTML parser, but not with an XML parser, so authors can decide to
maintain code only for tHTML and SGML parsing or for XML, SGML, and
tHTML parsing with the cHTML serialization.
Vocabulary:
This is really the meat of the proposal. This is my attempt to do what
I thought Ian should have been doing all along: listening to the
members of the WG, engaging in dialog with them on and off list, and
weaving their best suggestions into a new HTML vocabulary
specification of which we could all be proud. There is a lot of new
ideas in this vocabulary and therefore there is a lot to absorb in it.
However, I think the features will prove quite intuitive and simple
for authors to use and many things that today require complex
scripting can be done in HTML 4.1 simply and with HTML-style
declarative markup and a conforming browser.
The vocabulary specification is currently a combination of document
and implementation conformance criteria. It might be possible for us
to split this out later if we wanted to, but I think in the meantime
it is good to keep those norms together for editing purposes.
I have already elaborated most of the new HTML4all originated features
have
Strategy:
Some have asked me how we can hope to influence change in this area
without the support of the W3C. Ideally will might gain their support
and perhaps be invited into a genuine public process to develop HTML5.
However, I sincerely believe we can bring these changes about even
without the support of the W3C. Here's the steps I have in mind:
? to publish an HTML recommendation (recommendations) that put(s) the
needs of users and authors first
? to provide various machine readable schema (XSD, RelaxNG, DTD) in
addition to the normative prose of the recommendation to support
alternate UAs
- also provide an online validation/conformance service so authors
can check their documents? conformance to the specification
? to organize and support the implementation of this recommendation in
at least one open source rendering engine
? to organize the tracking of feature requests to bring support for
our HTML recommendation to all the major browsers
- with one reference implementation in place, there should be
significant pressure on other browsers to support these desirable
features
? develop javascript implementations of these features so that support
can be added even to non-conforming implementations (at lest the ones
supporting javascript).
? create tutorials to show authors how to use these new features
(e.g., monikers and XForms on anyElement features)
Because of a vibrant open source rendering engine community (KHTML,
WebKit, Mozilla) and several open source browsers (e.g., Shira), I
think we can lead the way both to better rendering engines and to
better browsers. That is not to say that the commercial vendors will
follow suit. However, much of what we propose degrades gracefully in
other browsers.
Wiki editing approach:
I welcome everyone to get involved with editing these specifications.
I have not done the work of specifying many of the existing features
drawn from HTML4. Leif and I have interacted somewhat bout this
privately and I'm convinced that a wiki approach would be ideal to
develop an HTML vocabulary specification that was as clear and concise
as could be. Ian's criticisms of HTML4 prose are sometimes justified,
though HTML5 is often worse. However I think starting form the HTML4
descriptions of the semantics of the elements and attributes and
improving upon them in a wiki way will create an excellent
specification.
Also on the new HTML 4.1 features, there's always room to improve upon
my prose. Having new eyes read those and point out what is not clear
and what is redundant would be quite helpful. Also some new features
are described on the original page for this project[4], but have not
been re-created on the new draft page (such as the 'access' element
and the 'marks' attribute).
Currently all of the various chapters (and modules) of the
specification are gathered together on one wiki page[2]. That page has
a corresponding discussion or talk page where deliberation can take
place about the language of the specification (and even the
substantive features). Normally on wikipedia there is a "no original
research" policy. However, in contrast this effort is original
research so that does not apply here. Instead I think we simply should
foster a collegial atmosphere where we work to build consensus and
focus on good faith improvements to HTML. Though I welcome us to think
about new and better ways to provide accessibility in documents, I do
not anticipate that many of the controversies we've faced in the HTML
WG regarding features such as summaries for tables will be a problem
here at HTML4All.
As for the final presentation, order and hierarchy of the prose, I
feel we might change that before it reaches its final form (and not
presented on a wiki. However in the meantime I think the wiki approach
will be an excellent way to shape the prose of this specification.
The parsing algorithm couldn't go on the wiki since mediawiki is
rather mediaweak when it comes to support for mildly complex
hierarchy. There is a wiki page about the algorithm[3] and the
corresponding talk page would be a fine place to discuss that
algorithm and changes to it. The algorithm is also a quite complex
read with many interlocking dependencies so filtering all the edits
for that through one editor is not such a bad idea anyway.
Take care,
Rob
[1]: <http://www.w3.org/TR/xhtml-media-types/>
[2]: <http://html4all.org/wiki/index.php/HTML_Draft>
[3]: <http://html4all.org/wiki/index.php/Parsing_tHTML>
[4]: <http://html4all.org/wiki/index.php/HTML>
More information about the List_HTML4all.org
mailing list