Re: The Reference Concrete Syntax is not Current Practice (Was Re: Standards, Work Groups, and RealiThe Reference Concrete Syntax is not Current Practice (Was Re: Standards, Work Groups, and Reali

Arjun Ray (aray@pipeline.com)
Wed, 27 Sep 95 01:52:20 EDT
On Tue, 26 Sep 1995, Glenn Adams wrote:

>
> Date: Mon, 25 Sep 95 23:37:27 EDT
> From: Arjun Ray <aray@pipeline.com>
>
> SGML takes no prisoners: parsing and validation are identical concepts.
> Scream-and-die when something doesn't validate is the SGML way.
>
> It may be the way of certain implementations, but it certainly need not be
> case. Given that error recovery is not mentioned let alone standardized by
> ISO 8879, an implementation has all the leeway it wishes to implement error
> recovery. Of course, problems lurk here too, since certain lexical errors
> may not be noticed at all in real SGML, e.g.:
>
> <A HREF="foo.htm>click here</A> and <A HREF=bar.htm">click there</A>
------------------------------------------------^

Believe it or not, apropos missing quotes, I've seen this too! :-)

> while this might be noticed by a non-compliant parser, e.g., one that
> terminates a tag with &#62; ('>') irrespective of whether it appears in
> an attribute value literal.

I'm concerned with the fallout from a non-compliant parser that
terminates a tag even before it knows there's an attribute value literal
lurking.

> The fragility of the Concrete Syntax reflects this, as does the
> unwillingness to distinguish lexical tokenization from content model
> enforcement as essentially *different* meanings of "parsing".
>
> In what way is the CS more fragile than other formal languages, e.g., C++,
> LISP, etc? How should/could lexical and syntactic levels be distinguished
> in ISO 8879? Isn't this distinction simply an aspect of an implementation
> rather than an aspect of the language? How does a language specification
> like C++ distinguish this difference?

Generally speaking, by having as small a set as possible of clear-cut
rules to distinguish data from operators. The CS is overly baroque --
considering how it easy it has *proven* to get wrong among the
less-than-expert. I posted a detailed explanation of comment syntax on
comp.infosystems.www.authoring.html, with references, and yet two days
later someone else posted that a HTML comment begins with "<!--" and ends
with "-->": so *dangerously* misleading because of its partial correctness.

At any rate, I think Dan Connolly said all that needed to be said in
"SGML is an _ugly_ solution to an _elegant_ problem" (<URL:http://www.
acl.lanl.gov/HTML_WG/html-wg-95q2.messages/0271.html>). As for the CS,
our problem is to deal with it. We can criticise it over beers at a bar:-)

Arjun Ray
(I speak for myself only.)