The initial version of our HTML parser was based on the SGML concrete
syntax and the then-current (about 10/94) HTML 2.0 draft specification.
It worked great on all our test documents. It worked pretty well on the
pages we tried at CERN.
It failed all over the place when we put it into beta testing. As a result,
in order to achieve market acceptance, we had to break the lexical level,
the content model, and other aspects of our parser until it duplicated
the parsing errors of the NCSA & Netscape parsers. We still get complaints.
Until Netscape and NCSA raise the bar by combining a tighter parser with
an incentive to upgrade (some cool new feature, like maybe Java applets),
we have little choice but to leave our parser broken. Now, I personally
think it would be nice if NetShark would become the market leader and thus
allow me to directly affect issues like this, but right now there's a ways
to go :), and this is part of why. Right now, a genuine Netscape(tm) broken
parser is a competitive advantage.
InterCon Systems Corporation