1. Why an explicit DTD?
Because the receiver must be given the option to validate, rather
than simply take a role where it can be dumped on. Run time or
synchronous validation with respect to arbitrary DTDs/schemas is
impractical.
Down the road, provided the "canonical" DTD is well designed, the
DTD could be taken as architectural, allowing for a well-defined
set of variations. This needs a declarative feature, however
(ATTLISTs in an internal subset, while part of traditional AF
syntax, may not be the only way to go.)
2. Why is 'fault' at the top level?
Because I think fault detection should occur as early as possible
on the receiving end. Also, if the fault description mechanism
needs extension or elaboration in the future, it's best to plan for
this now by treating faults as a distinct message type altogether
(rather than as a subset of the 'result' type.)
3. What's with 'data'?
Probably overspecification. The motivation was to provide for a
canonical format to handle forward references (inevitable in data
graphs with cycles.) In retrospect, canonicalization is probably
overkill - we should bend the rules for runtime optimizations.
I think getting rid of 'data' (and allowing for tactical variance from
the strict composition semantic of the element hierarchy) will need
another attribute, either on 'scalar' or on the types that can contain
scalars (to record the fact that direct containment is tactical rather
than factual - we'll need this for the multi-reference case also,
since the target will have to recorded somewhere.)
4. What is 'notice'?
Something for the messaging crowd. The semantic to be supported is
notification, and the assumption is that the data content of the
message is application defined completely. I have the content model
as ANY, but this could just as easily be #PCDATA, with the contents
wrapped in . The idea is that our framework offers no
services beyond a pure passthrough in this case. This is also a poor
man's extension mechanism.
5. Why both 'type' and 'class' attributes?
To express the very important distinction between structure and
semantics. 'type' is for structural information that could affect
things like in-core format and construction algorithms; 'class' is for
semantic information - such as 'object class' or 'package to bless
this structure into'.
6. Then what about 'name' attributes?
This is for formals - names of struct members, argument lists, etc.
the general semantic here is based on the Lisp-ish distinction between
names and values: values exist by themselves, while names hav values
bound to them. It is possible, if not likely, for more than one name
to have the same value (in the sense of 'object identity') bound, so
the correct representation must be one where the name in question is
in markup separate from the markup use to represent the value.
7. Why an explicit 'null'?
Optimization. e.g., for non-sparse arrays, 'null' can be used for
fill (hence the 'count' attribute.)
Also, since XML has lost the SGML distinction usually accorded to
EMPTY declared content (i.e. in XML and *are*
equivalent, sadly), I don't want to rule out the need to distinguish
from - the latter being
what otherwise I could have used to represent, while the
former would be the *expected* result of a 'void' procedure.
8. Array support.
Through the 'dim' and 'index' attributes. I forgot to add a 'unit'
attribute to the 'array' element type, to represent the size or
grnaularity of the atomic items in the array. Interpretation is
controlled by the value of the 'type' attribute.
- 'linear': one-dimensional array. 'dim' is a single number to
record the length.
- 'multi': row-major linearization of a multi-dimensional array,
'dim' being a space separated list of numbers (the product of
which is the overall length.)
- 'sparse': possibly random order linearization, 'dim' gives overall
size just like 'multi', while the 'index' attribute in each 'item'
locates the position, as a corresponding set of coordinates.
9. Why a single 'scalar' type?
Because the universe of scalar distinctions is unbounded. The trap to
avoid is overspecification of ontological distinctions ("integer",
"float", "string', "foo") that really don't matter in a *text* format.
Effectively, the #PCDATA content of a scalar *is* always subject to a
controlling notation - but messing with NOTATION declarations and
attributes and the like seemed overkill here. Hence the #REQUIRED
status of the 'type' attribute, as a minimal stand-in for all such
considerations.
We should assume that the deserialization logic "knows" about the
various scalars supported. If this seems unreasonable, then the
'type' attribute should be restricted to a name token group. AN opne
issue is whether a separate 'encoding' attribute is needed.
10. What's with 'map' and 'pair'?
Don't want to leave Scheme-rs out!:) Actually, I think 'pair'
captures a basic structural data type (association.) It may not be
used - or perhaps could be used for a variant representation of hashes
- but I think it belongs in the spec for completeness.
Also note that ordinary Perl hashes (where keys are known to be pure
strings, rather than possibly stringified representations) should use
the 'struct' type for encoding. Map+pair could be used to undo
stringification complications in other cases (as well as for hashes in
other schemas that don't abide by the Perl-ish restriction of string
keys, e.g. 'atom's for keys.)
[I believe complete orthogonality - only one way to do it - is a
chimera if we're trying to be as inclusive as possible in the spec.]
11. Issues not discussed yet:
- headers (such as 'mustUnderstand') and meta-data in general
- alternate transport(s)
- digital signatures
- multiple return values