[html4all] Interview: HTML 5 Editor Ian Hickson discusses features, pain points, adoption rate, and more

Leif Halvard Silli lhs at malform.no
Sat Aug 30 14:47:45 PDT 2008


Robert J Burns 2008-08-30 21.43:


>> [...] error handeling, if defined back then, could have differed
>> from both XML and HTML 5.
> 
> True, it could differ. But the only error handling that would reduce  
> 'tag soup' as I'm using the term would be XML style draconian error  
> handling. There are other benefits of defined error handling, but  
> reducing 'tag soup' is not one of them unless it is draconian error  
> handling (again I think you and Ian are using the phrase differently,  
> but I haven't figured out how you're using it yet).


Syntax errors, you said. That's what I meant as well.

 
>>> I think that would have been great, but that's not the Ian I know.
>>
>> In danger of repeating: XML and Ian agree that HTML should have
>> had error handeling from the start. They also agree that this
>> would have created less (with XML, theoretically close to zero)
>> tag soup. Both also try to answer the challenge from the past. But
>> they solve this challenge from the past differently.
> 
> Only XML style error handling reduces tag soup. Again, whatever other  
> benefits it may have, Ian-style error handling does not reduce 'tag  
> soup'.

First you say "Only XML style error handeling ...". Then you say 
"Ian-style error handling does not ..." Above we agreed that "it 
could differ", from both Ian and XML handling.

Ian-style handling would not have been possible from the start, 
because so many errors have developed until now which we did not 
have from the start!

[...]

> So how could it cause a reduction in 'tag soup'?


Answer:

 
>> Still, if all browsers were required to handle errors and
>> exceptions "mildly", but in the same way, then they would not have
>> had to invent this handeling by themselves. This would have lead
>> to fewer errors being accepted. Unlike today, when some browser
>> does it that way, another that way, a third that way - and as a
>> result, each browser has to support all the different ways of
>> handeling errors, in order to be compatible.
> 
> So are you and Ian using the phrase 'tag soup' to describe the various  
> ways browser handle errant HTML? 


To say "treat something as tag soup" has become synonymous with 
"to parse something as text/html". This, again means parse html 
with the undefined error handeling that has developed over time.

This is not how I used the term, however.

What I said was that the fact that the UAs handle, bless, excuse 
or don't accept tag soup/syntax errors based on a largly common 
but undefined error handling rules, has let many more errors = 
much more tag soup to grow and develop than we would have seen if 
error handling had been defined from the start.

> The reason XML error handling reduces  
> tag soup is that the author is made immediately aware of the errors  
> (the soupiness of their tags) even without using a validator or  
> conformance checker. Upon seeing their soup they fix it so that they  
> can publish their documents. Ian-style error handling, masks the  
> errors of the authors (and their authoring tools) from the authors.  


Do you use 'Ian-style' as a synonym for "the different but 
somewahat similar error handling that all HTML UAs have had to 
develop on their own because it, until now, had not been defined"?

I don think I or anyone else see the problems clearer just because 
you attach name of Ian to it.

> They do not know they are creating tag soup because they test it in  
> their browser and it works. With standardized Ian-style error handling  
> authors still can't see their errors upon testing in a browser.


That very thing is nothing new. The only UA I have used which 
constantly warn about errors is iCab. And that's why I use it.

> In  
> fact they may test it in a dozen HTML5 browsers and still not see the  
> soupiness of their tags. Perhaps you (and Ian) are saying that the  
> errors are no longer tag soup because they are handled in a consistent  
> manner? 


Do I say that the errors are no longer errors because they now are 
handled in a consistent manner? No, of course not.

Again, Ian spoke historically: He predicted a there would have 
been less errors in the HTML code around the globe if error 
handling had been defined from the start.

Let's say we added error handling to HTML 4 *today*, then this 
would not have reduced the number of errors in existing code 
unless UAs as a result started to treat some errors in a way that 
made the designers annoyed. However, for the future, I have hope 
that it could stop the number of errors to grow.

As for HTML 5, yes, it appears to be the case that some things 
that were considered tag soup (aka 'errors') in HTML 4, will 
become "well done" in HTMl 5. But that is a completely other issue.

The crux seems to be that you think that only XML style handling 
can reduce the number of errors. There we disagree. But I may have 
failed to convince you about that ...

> So then we have a new distinction (at least new to me):  
> conforming documents and errant documents where errant documents can  
> be broken down into those containing tag soup and those that don't.  
> For example <object>fallback</object data='file.mpeg' > would be a tag  
> soup errant document, but <b><i>some bold italics</b></i> would be an  
> errant document, with no tag soup. I'm just honestly trying to  
> understand how you're using the term tag soup here.

Ah, yes, true, some link 'tag soup' to the use of <b> and <i> 
instead of <strong> and <em>. I've done so myself, I think. It is 
unclear to me what you mean here, though. If one take the stance 
that <b> and <i> will become forbidden i HTML 5, then using them 
even if they are forbidden, would be an error and thus tag soup, 
which needs defined error handling. The common error handling of 
old tags seems to be to accept them and respect their meaning.

>>>> I hope at least those elements you mention /are/ included in the
>>>> <p> model, when HTML 5 is ready ...
>>>
>>> No, Leif. There won't be any tables, lists or block quotes in the  
>>> HTML paragraph element. They were in the original WG draft lats 
>>> year (only for the XML serialization), but they've now been 
>>> removed (presumably because of Ian's impeccable research and logic).
>>
>> This is of course steps in the wrong direction.
> 
> Like so many steps over the last 18 months.

Indeed.

>>>> Well, is microformats a way to extend or use HTML?
>>>
>>> Yes, but not the only way. That's why I was asking you to clarify.
>>
>> I would call it "using HTML".
> 
> Yes, I would too. However, its also using HTML in a way that extends  
> the expressiveness and semantics of HTML. If HTML (meaning here  
> specifically text/html HTML) supported XML namespaces-like  
> extensibility, then using those namespaces for extending HTML would  
> also be using HTML. Again, I think the microformats approach is a much  
> clumsier way to do extensibility because it stomps over other ways  
> authors commonly use HTML.

OK. But as I said: In LyX you can e.g. opt to write a letter. And 
then you will only be able to add things to that letter which are 
tolerated by that template.

Unfortunately, when someone starts to write a HTML document, then 
it is not given what it should becp,e, what form it should have 
and so on. You have a lot of semantic elements ready to use. But 
how you combine them, is not obvious.

>>>> On the general level, OK. But the editor must be targetted at the
>>>> spesific document. Only then can WYSIWYM work, I think.
>>>
>>> Again, I don't know what you're saying here. Editors do target
>>> specific documents.
>>
>> Specific /types/ of documents.
> 
> I'm still not understanding you. Could you list some examples of the  
> types of documents you are referring to. I'm thinking of an editor  
> that targets specifically an HTML type document, or an XHTML type  
> document. However, I can tell here you must be using type in a  
> different way.

Yes,  I mentioned previously some types: Letter, Article, and so 
on. [Sorry, should explained better but ...] May I ask what course 
you mean that HTML editor should have taken? I don't think you 
meant that one should be hand coding instad of using WYSIWYG
-- 
leif halvard silli



More information about the List_HTML4all.org mailing list