From joe@trystero.art.com Tue Jan 25 21:38:19 2000 Date: Mon, 2 Oct 95 10:11:29 EDT From: Joe English To: Multiple recipients of list Subject: PROPOSAL: An extension mechanism for HTML An Extension Mechanism for HTML Version: 1.0 J. English 30 September 1995 ABSTRACT HTML currently lacks a well-defined mechanism for developing and deploying new features. This proposal addresses a small part of this problem at the SGML level by adding a general-purpose ``alternate representation'' element. Content providers may use this element to supply an alternate representation for browsers which can not present or do not understand extended HTML features. A new scheme for handling unrecognized elements in HTML user agents is defined, and a brief list of guidelines for designing HTML extensions is presented. Issues of media type parameters for extended versions of HTML and mechanisms for actually extending the HTML DTD are _expressly not considered or addressed_ in this proposal. STATUS OF THIS MEMO This is a working draft, being circulated for comment only. If there is sufficient support for this proposal it will be submitted as an Internet-Draft. Please send comments and suggestions to the author , the mailing list, or the mailing list. CONTENTS 1 Statement of the problem 2 Proposed Solution 3 Changes to DTD 4 Impact on existing browsers and tools 5 Impact on existing documents 6 Deployment and interoperability 7 Format negotiation 8 Guidelines for extension elements 9 Potential problems 10 Acknowledgments and history A Other solutions A.1 ALT attribute instead of element A.2 ALTSRC attribute A.3 NOxxx elements A.4 Conditional Element A.5 Marked Sections A.6 No tags A.7 Omissible tags 1. STATEMENT OF THE PROBLEM _How do we teach current browsers to understand elements that haven't been invented yet?_ The HTML document type definition is still far from complete. There are several widely deployed new features which are not represented in the HTML 2.0 DTD, several more which have been proposed, and there will no doubt be even more in the future. At the same time, there is a large installed base of HTML user agents which (by definition) do not support newly-invented HTML extensions. It is not feasible for developers or users to simultaneously update all software every time a new extension is developed. Therefore a mechanism or mechanisms for providing backward compatibility with the installed base is desperately needed. 2. PROPOSED SOLUTION A new, general-purpose ``alternate representation'' element is defined as follows: That is, ALT may contain anything that is legal inside the BODY element, the start- and end-tags are required, and it has no attributes. The ALT element is not allowed in the content of any current HTML level 2 elements. Instead, it is intended to be used inside _new_ elements which are not part of the current standard. The ALT element contains an `alternate representation' of its parent element (no matter what that parent element is). The alternate representation should be presented if the user agent is not able to present the rest of the containing element. If the user agent is able to present the containing element, the content of the ALT element should be ignored. 3. CHANGES TO DTD This proposal entails no changes to the HTML 2.0 DTD, as it addresses HTML extensions only. In future extensions to HTML, any newly-defined elements which can appear as direct children of current level 2 elements (hereafter, `extension elements') may include the ALT element in their content model as an optional first subelement. Note: For the purpose of this proposal, new elements which appear only inside extension elements are not considered extension elements themselves. For example, the definition of the TABLE extension element would be changed from: to: Since TR, THEAD, and CAPTION are only allowed inside TABLE, they are not considered extension elements and need not include ALT in their content models. See below (8. "Guidelines for extension elements") for other guidelines in designing extensions. 4. IMPACT ON EXISTING BROWSERS AND TOOLS For cases where an extension element contains no other textual content (such as the proposed EMBED and FRAMESET elements), no change to existing browsers is required since the ``ignore unrecognized tags'' rule provides automatic backward compatibility. (In fact, for such cases there is no need to use a standardized name for the alternate representation element at all except possibly for uniformity.) (HTML 2.0 spec, 4.2.1 "Undeclared Markup Error Handling" [5]) To facilitate experimentation and interoperability between implementations of various versions of HTML, the installed base of HTML user agents supports a superset of the HTML 2.0 language by reducing it to HTML 2.0: markup in the form of a start-tag or end-tag, whose generic identifier is not declared is mapped to nothing during tokenization. [...] To support other extensions such as TABLE which _do_ contain content that cannot be presented by user agents which do not understand the extension, this guideline shall be amended as follows: [...] When encountering markup in the form of a start-tag whose generic identifier is not recognized by the user agent, if it is immediately followed by an start tag, then the content of the ALT element should be presented, and all content between the end-tag and the end-tag of the unrecognized element should be discarded. If no ALT subelement is present, then the content of the unrecognized element is treated as if its start- and end-tags were not present. Note that under this proposal, browsers are expected to keep track of the element hierarchy instead of simply discarding unrecognized tags. Ideally this will be accomplished by employing a true SGML parser with an extended DTD supplied by the document provider. However, even heuristic parsers should be able to accomplish this. User agents may also present the alternate content for individual instances of _supported_ extension elements, at their discretion or the user's instructions. For example, in the case of EMBED, a user may have disabled object embedding, or a particular embedded object may be unavailable; the user agent may use the alternate representation in these cases as well. 5. IMPACT ON EXISTING DOCUMENTS This proposal does not impact existing documents, except possibly for those which are already using extended HTML features. The authors of such documents may wish to take advantage of the proposed ALT element if and when sufficient browser support has been deployed. 6. DEPLOYMENT AND INTEROPERABILITY The current proposal places a large part of the responsibility for backward compatibility on document providers. (Of course so does any scheme which requires multiple representations of an element to be provided. I feel that the current proposal does more to assist document providers in doing so than other schemes.) Use of this feature is entirely discretionary, much like the ALT attribute on IMG. It will not place any extra, mandatory, burden on authors who wish to use extended or experimental HTML features; however, should they choose to supply an alternate representation, it will make it easier to do so. The alternate representation can be nearly anything, including a preformatted plain text rendering of the primary content, a hyperlink to a bitmapped image, or the ever-popular ``click here to download a more advanced browser'' message. This proposal is also amenable to automatic processing. For example, a preprocessor could scan for TABLE elements which do not contain an author-supplied ALT representation and insert a plaintext rendering of the table. 7. FORMAT NEGOTIATION It has been suggested on numerous occasions that Web user agents advertise which HTML features they suport, and that servers provide a ``down-translated'' version of documents when necessary. At present, there is no clear definition of how this should work at the protocol level. There have been several proposals, notably Dan Connolly's paper ``Toward Graceful Deployment of Tables in HTML'' [1], but this has not been widely implemented. Note: Several Web sites are known to use the HTTP User-Agent header to determine which version of a document to send. This is a questionable practice, and is error-prone and hard to maintain. The current proposal has several advantages over format negotiation schemes: Format negotiation only works for HTTP and other transport protocols which support it. The current proposal will work for any transport protocol, including none (e.g., local file system access). No modifications to server software are necessary. Format negotiation does not provide any solution to the inherently complex problem of maintaining or generating multiple versions of a document. Including alternate representations in the document itself takes advantage of SGML to manage this complexity. The current proposal provides more flexibility than automatic down-translation based on format negotiation, since it allows authors to choose a suitable alternate representation for each element instance. It also gives more control to information consumers, who might have no indication that an alternate representation is even available if automatic format negotiation were in use. 8. GUIDELINES FOR EXTENSION ELEMENTS In order to support heuristic parsers, end-tag omission shall not be allowed for any extension element, nor shall any extension element have EMPTY declared content or content reference attributes. Note: Again, new elements which are only legal inside extension elements are not themselves extension elements, so this rule does not apply to them. In particular, the current Tables, Frames, and EMBED proposals all satisfy this requirement. Requiring end-tags on extension elements will allow heuristic parsers to ``re-synchronize'' the element hierarchy even in the presence of subelements without end-tags. It is not anticipated that all or even most extension elements will require an alternate representation. For example, the HTML 3 / Netscape 2.0 BIG and SMALL tags can safely be ignored by browsers without losing information, so an alternate representation for these elements would not be necessary. To support ``on the fly'' formatting, an ALT element, if present, should be the first subelement of the element to which it applies. 9. POTENTIAL PROBLEMS The user community may be confused by the dual use of the name ALT as an element name and as an attribute name (on the IMG element) [7]. This is further exacerbated by the widespread (and incorrect) practice of referring to all syntactic constructs as ``tags'' instead of distinguishing between element names, attribute names, markup declarations, delimiters, and actual tags. If this is felt to be a serious problem, ALT could be renamed to ALTERNATE or something else. [[ See also [8]; I believe this has been addressed, by requiring user agents to keep track of the element hierarchy instead of discarding tags. ]] 10. ACKNOWLEDGMENTS AND HISTORY The idea of including an alternate representation in the document was first introduced with the ALT attribute on the IMG element. This was further refined in HTML 3 with the FIG element, which directly contains its alternate representation. The proposed FRAMESET and EMBED extensions took this a step further, by introducing explicit container elements for this purpose. The current proposal simply generalizes and formalizes this basic idea. Discussion on the html-wg mailing list has provided invaluable input exploring all the issues involved. A. OTHER SOLUTIONS A number of other approaches to this problem have been suggested. [[ This section is a bit of a mess right now... -JE ]] A.1. ALT ATTRIBUTE INSTEAD OF ELEMENT It has been suggested that the alternate representation might appear on an attribute, as it is with IMG [9]. Due to the severe limitations of this approach, this is not advisable [10]. A.2. ALTSRC ATTRIBUTE Another approach is to supply the URI of a document containing an alternate representation on an attribute of extension elements. The attribute would have a standardized name, say ALTSRC. For example: ...
Table 1
where table1.txt contains a preformatted, plain text rendering of the table. Under this scheme user agents would check for an ALTSRC attribute on start-tags with an unrecognized element name instead of completely ignoring them. If such an attribute is found, the user agent would discard the content of the unrecognized element and display the referenced URI either inline or as a hyperlink. This has the advantage of only transmitting the alternate representation if it is actually needed, saving transmission time. It would also help keep source documents less ``cluttered,'' since it would not be necessary to duplicate information in the main document. Note: This solution could be used in addition to the current proposal; the two are mutually compatible. A.3. NOXXX ELEMENTS Another approach is to define a new alternate representation element for each new feature (e.g., NOFRAMES [2] and NOEMBED [3]), instead of using a standardized element name. This works when the extension element has no other textual content (as is the case with FRAMESET and EMBED), but not for extension elements with primary content. For example, if a user agent does not know about the TABLE element, it will not know that a (hypothetical) NOTABLES element contains an alternate representation either, and would still attempt to display the TABLE content under the ``ignore unrecognized tags'' rule. Note: A naming convention for generic identifiers -- for example, assuming that an unrecognized element name NOxxx is an alternate representation of a new xxx element -- is dangerous and ill-advised. A.4. CONDITIONAL ELEMENT It has been suggested that the ALT element take a FEATURE attribute, which would be used to determine whether or not the ALT content should be displayed. Under this scheme, the ALT element may appear before instead of inside the extended element. [[ Citation? ]] A similar proposal calls for an OPTION element, with PRESENT and ABSENT would be a list of ``feature keywords''; the content should only be displayed if the feature is supported or unsupported, respectively. [7] Both of these schemes work on a per-feature basis instead of a per-element instance basis, so they are more coarse-grained and hence less flexible than the current proposal. I feel they are also more error-prone and less intuitive. The current proposal uses containment to express the relationship between an element and its alternate representation. In a conditional inclusion scheme, this information is lost. A.5. MARKED SECTIONS Another suggestion is to ``modularize'' the DTD, and include parameter entities for each module. These would be defined by the user agent to either INCLUDE or IGNORE, depending on whether or not the module is supported, and authors could use them as status keywords in marked section declarations [7]: ]]> This would require browsers to support marked sections (which they ought to anyway), and a much greater familiarity with SGML (also not a bad idea). On the down side, it requires a greater implementation effort and, like the conditional element scheme, obscures the relationship between the primary and alternate representations of an element. It is also likely to be confusing to the user community. A.6. NO TAGS In the HTML 3 draft, the FIG element's _content_ was the alternate representation. It has also been suggested that EMBED work this way: (<199509250245.WAA29529@panix2.panix.com>) There is no need for redundant NOEMBED tags. Each EMBED is an implied choice between fetching the URL in question or rendering the enclosed content. [[ Full citation? ]] I find this less intuitive than supplying explicit start- and end-tags for the alternate content. Also, it does not allow extension elements to contain primary (non-alternate) content; this could be detrimental to future enhancements. (For example, EMBED may eventually include subelements to be used as parameters for processing the embedded object.) A.7. OMISSIBLE TAGS The start- and end-tags for ALT could be made omissible: This would allow current HTML 3 documents which use FIG to remain valid without being updated. Omitting the ALT start- and end-tags would defeat heuristic parsers in some cases, so providers would need to take care to include them where they might be necessary. This would apply only to extension elements which have textual primary content; current uses of FIG would still work. REFERENCES [[ Fill this in... Tables draft, Netscapes FRAMES and EMBED proposals, FIG discussions. ]] [1] Toward Graceful Deployment of Tables in HTML () Dan Connolly , 13-Mar-1995 [2] A Proposed Extension to HTML: Frames (<305E5CF5.45AE@netscape.com>) Eric Bina , 17-Sep-1995 [3] The REAL proposal for addition to HTML 3.0: EMBED (<305F9E53.712E@netscape.com>) Alex Edelstein , John Giannandrea, 19-Sep-1995 [4] HTML3 Tables () Dave Raggett , 25-Sep-1995 [5] HTML 2.0 () Dan Connolly and Tim Berners-Lee. [6] HTML-WG Mailing List Archives () HyperMail archive of the HTML Working Group mailing list. [7] html-wg-95q3: Re: A proposal for addition to HTML 3.0: EMBED () Liam Quin, <9509220353.AA25633@sqrex.sq.com>. [8] html-wg-95q3: Re: A proposal for addition to HTML 3.0: EMBED () Alexei Kosut, [9] html-wg-95q3: ALTs for EMBED, etc () Terry Allen, <9509220740.ZM4827@dmg.west.ora.com> [10] html-wg-95q3: ALTs for EMBED, etc () Mike Meyer, <19950922.78180D8.9477@contessa.phone.net>