Author: W. Eliot Kimber
Describes the author's attempt to use the Resource Description Framework (RDF) data model as an SGML document architecture conforming to the Architectural Forms Declaration Requirments annex of ISO/IEC 10744:1997 (HyTime Second Edition). Includes an explanation of RDF base architecture as well as examples of using that architecture derived from the examples used in the RDF working draft.
Copyright (c) 1997 W. Eliot Kimber and ISOGEN International Corp.
The Resource Description Framework is formally defined as a set of XML element types. As such, it can be easily adapted for use as an SGML architecture. SGML architectures differ from normal document types in that they define a set of meta element types that are used as templates for element types in documents (or other architectures). They are in all respects like normal document types except that they are used by reference rather than being part of the document syntactically.
The Architectural Forms Definition Requirements (AFDR) annex of ISO/IEC 10744:1997 defines the formal mechanism for declaring and using architectures. The AFDR mechanism includes a formal, machine-processible mechanism for mapping from elements in documents to element 'forms' declared for architectures. Creating such a mapping serves to formally derive some or all of the elements in a document from the forms in the architecture. This formal derivation enables processing of elements based on the meta-element types they are derived from. One of the key features of architectural mapping is that anything that is not mapped to an architecture is ignored for the purposes of processing with respect to that architecture. This makes it easier to intermingle elements derived from different architectures with fewer conflicts among the requirements imposed by the architectures.
Architectures enable the definition of a set of semantics while imposing on the documents that invoke those semantics only those constraints needed to meet the requirements of the architecture. In particular, architectures never impose constraints on element type names and attribute names, and need not impose constraints on order of occurrence. Archtectures never impose constraints on element type names and attribute names because the architectural mapping from document types to architectural types is indirect, allowing any document element type to be mapped to any architectural form.
The architectural mapping mechanism includes an automatic mapping mechanism that can significantly simplify the task of defining mappings. This mechanism is explained in more detail below.
The RDF defines a set of element types with clear semantics intended to be specialized for domain-specific use. Thus the RDF is already an architecture conceptually and is therefore a natural candidate for use as a formal SGML document architecture.
Making the RDF into an architecture required three things:
As the RDF spec doesn't use the SGML declaration syntax (instead, it uses productions that show the actual XML document syntax), I had to create the declarations myself. However, it was not hard to infer the declarations from the RDF spec's productions.
You give a name to an architecture by declaring it as a notation and giving the notation a public identifier (or URN). I created an SGML public ID for the RDF architecture and used the URL of the RDF spec as a substitute URN (URLs are not URNs) by using it as the system identifier for the notation. The system identifiers for notations are expected to get you to the documentation for the notation, so using the URL of the RDF spec is exactly the right thing to do.
The resulting declaration set, the RDF Base Architecture meta-DTD, can be found at http://www.isogen.com/demos/RDF/rdfbase.mdt. This declaration serves as the complete formal declaration of the RDF architecture and serves first and foremost as documentation. It can also be used by tools such as the SP parser to provide generalized architectural processing. In addition, by putting more declarations into the DOCTYPE declaration, the instance syntax can be significantly reduced. Sometimes it's useful to reduce instance syntax (such as when editing) and sometimes it's not. There's no free lunch, so some kind of mapping has to go somewhere. The architectural mechanism lets you choose whether to concentrate the mapping in the declarations or avoid declarations and put the mapping in the instance. The two forms are functionally equivalent and one can be transformed to the other without loss. For example, a document with declarations can be transformed into a document without by making explicit in the instance attributes that are fixed in the declarations (which you might want to do for transmission, especially when the document instance is smaller than the declarations, even with the added attributes).
RDF-aware processors need never process these declarations as they presumably have the rules expressed in part by these declarations built into them. Assuming that popular Web browsers and servers are such servers, these programs would never need access to these declarations. However, full SGML systems that provide architectural processing features may need them or, as in the case of SP, require them.
Thus, within an XML use context, the machinery needed to use and invoke the architecture can be kept to a minimum because the processors that provide RDF-related functions can either use defined defaults or make reasonable assumptions about how documents relate to the RDF architcture. These defaulting mechanisms are discussed in some detail below.
In the examples, I have shown the full (or almost full) syntax versions of the different cases along with the minimal forms in order to both make as clear as possible the relationship between the minimal forms to the formal mechanisms and to provide examples that can be easily used with the SP and Jade tools, which support both formal architectural processing and parsing of XML documents.In everyday use, only the minimized forms need to be used and RDF-aware processors need not provide generalized architectural processing facilities (although providing them is not particularly difficult once one has a generalized XML parser).
The RDF specification was not designed to be an architecture, nor was it designed as a full SGML (or XML) document type. Thus I had to make a few minor syntax changes from the RDF spec to make it work as an architecture. I did not intentionally change any of the semantics of the RDF objects reflected by the syntax defined in the RDF spec. (In the discussion that follows, I use "RDF spec" to refer to the original RDF specification and "RDF architecture" to refer to the architecture defined in this document.)
The RDF architecture meta-DTD is defined using full SGML syntax, as are all the derived meta-DTDs used in the examples that follow. This is because the formal architectural mechanism requires the use of SGML facilities XML doesn't support, especially data attributes. However, as these declarations are primarily design documents, it doesn't matter that they can't be used directly for XML documents. Equivalent declaration sets can be created that do conform to the XML constraints, if necessary. I have essentially made the assumption, for the purposes of this paper, that any generalized architectural processing will be done using full SGML tools (e.g., SP), and not XML-only processors.The syntax differences between the RDF specification and the RDF architectural meta-DTD are:
assertion-setelement to serve as the document element for documents containing multiple assertions. This is required by the architectural approach which must be able to construct complete documents ('architectural instances') from the data in the base document that applies to the architecture. As the RDF allows multiple assertion blocks in the same document, I had to add an element to contain them. However, this element has no semantics other than containment and does not affect the RDF data model itself.
ablockto provide element-based scoping of property domains.
propertyelement form to allow explicit connection from a property to a defined property domain (name space).
propertyelement, rather than non-terminal placeholder as in RDF spec. Base documents will normally declare elements with their own element types derived from
property(although they could use the
propertyto capture property names used in base documents. By default, the property name is taken to be the GI of the element in the client document, minus any prefix, if specified.
HREFattribute #CONREF for
itemto reflect intent of RDF spec (element is empty if
It doesn't matter that XML doesn't support the #CONREF keyword of SGML--the
RDF architectural meta-DTD is a full-SGML declaration set and serves to express
the intent of the RDF specification. In XML documents, you would simply use
the empty tag syntax for elements that specify the #CONREF
prefixdata attribute for the RDF architecture notation. This attribute performs the same function as the
ASattribute of the
namespaceelement. It defines the namespace prefix used for the element type names of elements in the client document. This value is stripped from the element type names of
property-form elements to determine the real property name.
This mechanism is not needed to disambiguate element type names in the
instance as the architectural mechanism is not sensitive to instance element
type names (because the name-space mapping is done using element attributes
and notation declarations). However, this mechanism supports the RDF-specific
semantic for determining property names from element type names in the absence
of an explicit value for the
propname attribute. It
does not conflict with any architectural mechanisms and helps to motivate
the use of domain-specific prefixes in architectures derived from the RDF
I changed the name 'as' to 'prefix' because the name 'as' was not descriptive
outside the context of the
namespace element (i.e.,
when used as an attribute of the RDFBase notation). It's defining a prefix,
so why not call it that?
RDFBridgearchitectural bridging element.
namespaceelement with a processing instruction, RDFnamespace. This PI can be used in declaration sets or in the instance.
The RDF mechanism essentially provides a way to encode property-value pairs where the property names are expected to be defined in some defined property name space ("property domain"). The RDF model allows property names to have prefixes that help distinguish property names from different property domains. It defines the notion of a 'name space prefix' and defines rules for both associating prefixes with properties and the rules for distinquishing prefixes from property names (e.g., using a '::' separator). Note that with this mechanism, the actual characters used for the prefix separator could be anything (e.g. "..", "-", or whatever). The double colon convention has some attraction but its use is not necessary for the prefix mechanism to work.
This prefix mechanism is not strictly necessary to enable architectural
processing. In addition, the RDF architecture makes the property name an
attribute of the generic
property element. However, it
is convenient to be able to infer property names from element type names in
the absence of explicit element declarations (if there are explicit declarations,
propname attribute value can be set by the declaration
irrespective of the element's element type name). In addition, it is likely
that different domains will define property names that have the same name
but different semantics or expected content. Such names cannot be used as
element type names in the same document and apply to both domains. One way
to remove the name clash is to add prefixes to the element type names reflecting
the domain from which the properties come. The RDF prefix mechanism both
provides guidance for choosing prefixes and makes it possible to infer the
real (unqualified) property name from element type names that include prefixes.
The RDF architecture makes the
propname an implied
attribute in the architecture. That means that it's up to the processor to
infer the value of the
propname attribute if it's not
specified. In this case, the processor will be an RDF-aware processor. Thus
the RDF architecture defines the algorithm for determining the value of
propname attributes by examining the element types of elements
derived from the
property element form as modified by
the domain prefix specified for the property name space. This type of inference
mechanism is no different in kind from any other application-specific semantic
for determining the values of implied attributes.
propnamedata attribute for that notation.
namespaceattribute in the RDF spec).
I've provided three ways to do the same thing because there are different ways to declare and configure the use of an architecture. As this mechanism is intended to be used with XML documents where there may be no declarations at all and where we cannot use data attributes in any case, we need the PI. The data attribute approach is most convenient when using the full architectural declaration mechanism, where all the properties of an RDF name space can be defined in one place (on the notation declaration). I would expect the notation approach to be used primarily in centralized DTD declaration sets or architectures.
The use of a PI to set the prefix is appropriate (or rather, is not inappropriate) because it relates to an RDF-specific semantic for attribute value defaults, not to the parsing of the document. It satisfies the requirement of disambiguating properties from different semantic domains used in the same document.
An SGML document architecture is intended as a base or template from which specialized document types or architectures are derived. If one architecture is derived from another, it forms a hierarchy of architectures, from the most general at the bottom, to the most specialized at the top. This is conceptually similar to creating object hierarchies in object-oriented programs, but is not quite the same because the architecture hierarchy is a hierarchy of semantic definitions, not working program objects. In addition, the architecture mechanism works with existing SGML syntax and parsing, which currently lacks any idea of inherentance of syntactic rules. So, for example, there is no literal inheritance of content models or attribute declarations, which you might expect to get from such a mechanism (there are other reasons why such inheritance is problematic at best, if not impractical in the general case).
The RDF data model is explicitly designed to be specialized, as evidenced
by the property object type, which represents a property name/value pair.
The RDF model does not define any property names and expects applications
of the RDF model to define property names by declaring element types where
the element type names are the application-specific property names. The RDF
design as written provides a minor problem because, unlike all the other object
types in the RDF model, the property object does not have a fixed element
type associated with it. However, to create an architecture for the RDF spec,
there must be a defined element type for elements that are properties to map
to. Thus the RDF architecture has an element form called
property. To capture the name of the property, the
property element has an attribute,
is set to the value of the property name used in the specialization. The
content of the
property element is the property value.
To use the RDF architecture with your documents, you create element types derived from the element forms in the RDF architecture. The easiest way to do this is to use the RDF architectural forms directly as element types, e.g.:
<?XML 1.0?> <?IS10744 ArcBase RDFBase ?> <MyDoc> <metadata> <title>My Document</title> <ablock> <property propname='subject'>me</property> </ablock> </metadata> ... </MyDoc>
This approach works fine--the mapping from the elements in MyDoc to the RDF elements is direct and obvious, taking advantage of the automatic architectural mapping mechanism that automatically maps elements to architectural forms of the same name. However, it is unsatisfying for two reasons:
propertyelement is verbose. It would be better if you could declare your own element types to define your own property names, eliminating the need to specify
propnameattributes in the instance (either because the element type name is the property name or because the attribute value is fixed by declarations for the elements derived from the
itemelement type (common for lists). If you have such conflicts, you have to use different element type names for one set or the other and you probably don't want to change your existing names.
Thus, it's more likely that the RDF architecture will be used indirectly, by using specialized elements derived from the RDF forms. In your documents, you indicate the derivation by doing two things:
Thus, the previous example could be reworked to something like this:
<?XML 1.0?> <?IS10744 ArcBase RDFBase ?> <MyDoc> <metadata> <title>My Document</title> <ablock> <subject RDFBase='property'>me</property> </ablock </metadata> ... </MyDoc>
The processing instruction
<?IS10744 ArcBase RDFBase > is a declaration that says that the name 'RDFBase' is the name
of a base architecture. This is the minimum declaration necessary to formally
indicate the use of an architecture. There are additional declarations that
can be used if necessary. These are discussed later. The keyword 'IS10744'
refers to International Standard ISO/IEC 10744:1997, the standard that defines
the meaning of this processing instruction. The keyword 'ArcBase' indicates
that the PI is listing the names of base architectures from which this document
is derived. The name 'RDFBase' is the name of the architecture. This name
is used as the name for the architectural form naming attribute if you don't
explicitly declare a different one.
Now you have an element called
subject that is
derived from the RDF architectural form
indicated by the value 'property' of the
There are no
RDFBase attributes on the
MyDoc elements because their mapping
to the RDF element forms is automatic by application of the automatic mapping
rules for architectures. By default, the document element of the base document
is automatically mapped to the document element of the architecture (for RDF,
assertion-set element form). If an element has the
same name as a form in the architecture, it is also automatically mapped to
that form. Thus the
ablock element in the document is
automatically mapped to the
ablock form in the RDF architecture.
The previous examples put the architecture mapping attributes on the element instances, which you have to do if you have documents without explicit DTD declarations. However, if you do have explicit DTD declarations, you can use fixed attributes or attributes with explicit default values to define the mapping, simplifying the instance syntax. For example, the document above could use these declarations:
<?XML 1.0?> <!DOCTYPE MyDoc [ <?IS10744 ArcBase RDFBase ?> <!ELEMENT MyDoc (Metadata, Content)> <!ELEMENT Metadata (title, ablock)> <!ELEMENT Title (#PCDATA)> <!ELEMENT Ablock (subject)> <!ELEMENT Subject (#PCDATA) ><!-- The subject of the document --> <!ATTLIST Subject RDFBase NAME #FIXED 'property'> ... ]> <MyDoc> <metadata <title>My Document</title> <ablock> <subject>me</subject> </ablock> </metadata> ... </MyDoc>
Here, the instance has been simplified. In this example, the instance is smaller than the declarations, but in a real environment, you might have many metadata properties and the declarations might be reused by many documents, making the cost of creating the declarations worth the effort.
If you are using the RDF architecture in an ad-hoc manner to create whatever properties you happen to think of at the moment, then using the RDF architecture directly as a base architecture works fine. However, there are many domains that define specific metadata properties, such as the PICS specification, MARC records for cataloging documents in libraries, not to mention enterprise-specific metadata related to specific business processes. Any domain-specific set of properties can be expressed as a new SGML document architecture derived from the RDF architecture. Documents can then use these specialized RDF architectures directly (just as they could use the RDF architecture) or they can further specialize (if only to avoid name conflicts).
The purpose in deriving a new architecture derived from the RDF architecture is primarily to capture as a separate specification (and related meta-DTD) the rules for RDF properties in a specific domain. This separates the metadata design from the design of documents or document types that might use it and opens the way for additional specialization as needed without the need to change the original architecture. The RDF object model is simple enough that it is unlikely that any requirements will be found in the future requiring drastic change to the design itself.
To derive a new RDF architecture you do just what you would do to create a document with explicit DTD declarations. However, instead of using the declarations for documents, you use them indirectly as an architectural meta-DTD just as the RDF architecture meta-DTD is used above. Because architectural meta-DTDs are intended to be used by reference and because they act as part of the documentation for an architecture (as well as enabling machine validation of conformance to the architecture), it is important to make them a bit more formal than the minimum required in documents. In particular, you should both declare any notations for any architectures from which the architecture is derived, as well as provide a template for the notation declaration to be used for the architecture itself. As architectural meta-DTDs are usually only used by reference, these added declarations will rarely, if ever, be seen by anybody who doesn't need to see them.
Thus, to create a domain-specific metadata architecture derived from the RDF spec, you would do something like this:
<!-- Meta-DTD for my metadata architecture. This architecture is derived from the RDF architecture. Use the following declarations to declare the use of this architecture: <?IS10744 ArcBase MyMetadata ?> <!NOTATION MyMetadata SYSTEM '-//ME//NOTATION My Metadata Architecture//EN'> <?RDFnamespace domain='MyMetadata' href='http://www.me.com/mymetadata.html'> --> <?IS10744 ArcBase RDFBase > <?IS10744 ArcBase RDFBase ?> <!NOTATION RDFBase PUBLIC '-//W3C::RDF//NOTATION Resource Description Framework (RDF)Model and Syntax//EN' 'http://www.w3.org/Member/9708/WD-rdf-syntax-970801.html'> <!-- NOTE: assertion-set, ablock, copied directly from RDF meta-DTD. --> <!element assertion-set -- A set of RDF assertions -- - - (ablock*) > <!element ablock -- Assertion block -- - - (property*) > <!attlist ablock href -- Resource URI (resource to which this assertion applies) -- CDATA #IMPLIED -- Default: ablock applies to containing resource -- ID -- Unique identifier for the assertion block -- ID #IMPLIED reftype -- Resource named is another assertion block. This assertion block should apply to whatever resource that assertion block identifies. -- (indirect) #IMPLIED -- Default: not indirect. -- > <!element property -- Property assertion specification -- - - (%property-value;)* > <!attlist subject -- The subject of the document -- propname -- The name of the property -- -- NOTE: The value of this attribute is normally the GI of the element in the derived RDF architecture. -- NAME #FIXED 'subject' href -- URI of property value -- CDATA #CONREF domain NAME #FIXED 'MyMetadata'> <!-- End of meta-DTD -->
A document would use this architecture just as you would use the RDF architecture:
<?XML 1.0?> <?IS10744 ArcBase MyMetadata ?> <MyDoc> <metadata> <title>My Document</title> <ablock> <subject>me</subject> </ablock> </metadata> ... </MyDoc>
Note the two differences between this example and the nearly identical one above:
subjectelement no longer requires an
RDFBaseattribute because its name matches the name of an element form in the MyMetadata architecture.
Note one interesting effect of declaring this specialized architecture: the instance syntax is further simplified without the need to have its own declarations to set the values of the architectural mapping attributes (because we've taken full advantage of the automatic architectural mapping rules).
Note also that the declaration set for MyDoc shown above could also have been used as an architectural meta-DTD, rather than as the DOCTYPE declaration for the instance. This would provide the same instance simplification. The document could be validated against the declarations with the same result in either case (because the declarations and the document are the same in both cases except that in the latter case, the declarations are not syntactically part of the document, but are used by reference).
All the examples in the RDF specification can be translated to use the architectural mechanism with a minimum of change. The only difficulty in doing this translation is, in some cases, figuring out whether a domain-specific architecture is warranted. I tried to determine this from the information in the examples, but I may have misunderstood the examples and therefore made architectures where none were needed or the reverse.
To simplify the examples, I have defined the following external parameter entity containing declarations for the assertion-set, ablock, and property element types to simplify the creation of derived architectures and documents with explicit DTD declarations:
<!-- Re-usable declarations for assertion-set and ablock element types --> <!-- Refer to as 'ablock.dtd' --> <!ENTITY % ablock-name 'ablock' -- Change this if you change name of ablock element --> <!ENTITY % ablock-content 'namespace*, property*' > <!ELEMENT assertion-set (%ablock-name;)* > <!ELEMENT %ablock-name; (%ablock-content;) > <!ENTITY % property-value '#PCDATA | ablock' > <!ELEMENT property (%property-value;) > <!ATTLIST property propname CDATA #IMPLIED domain NAME #IMPLIED href -- URI of property value -- CDATA #CONREF >
The original PICS example (5.1.1) is:
<ablock> <namespace href='http://www.gcf.org/v2.5' as='gcf'/> <gcf::suds>0.5</gcf::suds> <gcf::density>0</gcf::density> <gcf::color>1</gcf::color> </ablock>
Architectural version without architectural notation declaration:
<?IS10744 ArcBase RDFBase ?> <ablock> <?RDFnamespace domain='gcf' href='http://www.gcf.org/v2.5' prefix='gcf'> <gcf::suds RDFbase='property'>0.5</gcf::suds> <gcf::density RDFbase='property'>0</gcf::density> <gcf::color RDFbase='property'>1</gcf::color> </ablock>
Architectural version with architectural notation declaration (directly derived from RDFBase), using RDFnamespace PI to define property domain (name space):
<?XML 1.0 ?> <!DOCTYPE ablock [ <?IS10744 ArcBase RDFBase ?> <!NOTATION RDFBase PUBLIC '-//W3C::RDF//NOTATION Resource Description Framework (RDF)Model and Syntax//EN' 'http://www.w3.org/Member/9708/WD-rdf-syntax-970801.html'> <?RDFnamespace domain='gcf' href='http://www.gcf.org/v2.5' prefix='gcf'> ]> <ablock> <gcf::suds RDFbase='property'>0.5</gcf::suds> <gcf::density RDFbase='property'>0</gcf::density> <gcf::color RDFbase='property'>1</gcf::color> </ablock>
Full SGML architectural version with all declarations and using notation to declare property domain:
<!DOCTYPE ablock [ <?IS10744 ArcBase RDFBase ?> <!NOTATION RDFBase PUBLIC '-//W3C::RDF//NOTATION Resource Description Framework (RDF)Model and Syntax//EN' 'http://www.w3.org/Member/9708/WD-rdf-syntax-970801.html' <!ATTLIST #NOTATION RDFBase ArcFormA NAME #FIXED 'RDFbase' ArcDTD CDATA #FIXED 'RDFbase.meta-DTD' ArcDocF NAME #FIXED 'assertion-set' ArcNamrA NAME #FIXED 'RDFbase-names' ArcBridF NAME #FIXED 'RDFBridge' > <!NOTATION GCF SYSTEM 'http://www.gcf.org/v2.5'> <!ATTLIST #NOTATION GCF prefix NAME #FIXED 'GCF' RDFBase NAME #FIXED 'RDFnamespace'> <!ENTITY % property-value 'gcf::suds | gcf::density | gcf::color'> <!ENTITY % ablock SYSTEM 'ablock.dtd'> %ablock; <!ELEMENT gcf::suds - - (#PCDATA) > <!ATTLIST gcf::suds propname NAME #FIXED "suds" RDFBase NAME #FIXED "property" > <!ELEMENT gcf::density - - (#PCDATA) > <!ATTLIST gcf::density propname NAME #FIXED "density" RDFBase NAME #FIXED "property" > <!ELEMENT gcf::color - - (#PCDATA) > <!ATTLIST gcf::color propname NAME #FIXED "color" RDFBase NAME #FIXED "property" > ]> <ablock> <gcf::suds>0.5</gcf::suds> <gcf::density>0</gcf::density> <gcf::color>1</gcf::color> </ablock>
The second PICS example (5.1.3) adds a second assertion block and introduces additional property types and a second name space. As the additional property types are defined by the PICS schema, I assumed that the PICS schema would be best defined as an architecture. Here is the PICS architectural meta-DTD (reflecting the properties used in the example):
<!-- Declarations for PICS meta-DTD. Derived from RDFBase architecture --> <!-- Refer to this architecture as 'PICS' --> <!-- This notation is also a property domain name space. --> <?IS10744 ArcBase RDFBase > <!NOTATION RDFBase PUBLIC '-//W3C::RDF//NOTATION Resource Description Framework (RDF)Model and Syntax//EN' 'http://www.w3.org/Member/9708/WD-rdf-syntax-970801.html' > <!ATTLIST #NOTATION RDFBase ArcFormA NAME #FIXED 'RDFbase' ArcDocF NAME #FIXED 'assertion-set' ArcNamrA NAME #FIXED 'RDFbase-names' ArcBridF NAME #FIXED 'RDFBridge' > <!ENTITY % property-value 'by | on | until' > <!ENTITY % ablock SYSTEM 'ablock.dtd' > %ablock; <!element by (#PCDATA) > <!element on (#PCDATA) > <!element until (#PCDATA) > <!attlist (by | on | until) domain NAME #FIXED 'pics' RDFBase NAME #FIXED 'property'> <!-- End of meta-DTD -->
The document from example 5.1.3 would now look like this:
<?XML 1.0 ?> <!DOCTYPE Document [ <?IS10744 ArcBase RDFBase PICS ?> <!NOTATION PICS SYSTEM 'http://www.gcf.org/v2.5' > <?RDFnamespace domain='pics' href='http://www.gcf.org/v2.5' prefix='pics'?> <!ATTLIST pics::by pics NAME #FIXED 'by'> <!ATTLIST pics::on pics NAME #FIXED 'on'> <!ATTLIST pics::until pics NAME #FIXED 'until' > ]> <Document> ... <ablock href='http://www.w3.org/PICS/Overview.html' id='block1'> <?RDFnamespace domain='gcf' href='http://www.gcf.org/v2.5' ?> <suds RDFbase='property'>0.5</suds> <density RDFbase='property'>0</density> <color RDFbase='property'>1</color> </ablock> <ablock href='#block1'> <pics::by>John Doe</pics::by> <pics::on>1994.11.05T08:15-0500</pics::on> <pics::until>1995.12.31T23:59-0000</pics::until> </ablock> ... </Document>
There are two key changes from the RDF spec example:
RDFBaseattributes for the first assertion block, required because those properties are not defined in an architecture.
namespaceelement from the second assertion block. The name space identification is handled by the PICS architecture, which also defines the PICS name space.
The original example from the RDF spec is:
<ablock href='http://w3.org/PICS/Overview.html'> <namespace href='http://www.gcf.org/v2.5'/> <suds>0.5</suds> <density>0</density> <color> <ablock> <hue>1</hue> <lightness>45</lightness> <saturation>70</saturation> </ablock> </color </ablock>
The architecture version of this example would not be materially different
from any of the above, as the only addition is a nested
ablock, which has no architecture use implications in this case (because
the nested properties are from the same property domain as the property that
The first Dublin Core example simply provides a new property domain, the one defined by the Dublin core. Like any other property domain, the domain can be declared as an architecture derived from the RDF, as a separate notation, or using the RDFnamespace PI. Because the Dublin Core is intended to be a widely-used property domain, it probably makes the most sense to define it as an architecture, as much for its documentary benefit as for its processing utility.
The original example is:
<ablock> <namespace href='http://www.oclc.org:5046/dublin_core/RDFschema'/> <title>The Taxonomy of Pumpkins</title> <creator>Ora Lassila</creator> <language>FIN</language> </ablock>
From this we can define an architectural meta-DTD along these lines:
<!-- Declarations for Dublin Core meta-DTD. Derived from RDFBase architecture --> <!-- Refer to this architecture as 'Dublin-Core' with the URL 'http://www.oclc.org:5046/dublin_core/RDFschema' --> <!-- This notation is also a property domain name space. --> <?IS10744 ArcBase RDFBase > <!NOTATION RDFBase PUBLIC '-//W3C::RDF//NOTATION Resource Description Framework (RDF)Model and Syntax//EN' 'http://www.w3.org/Member/9708/WD-rdf-syntax-970801.html' > <!ATTLIST #NOTATION RDFBase ArcFormA NAME #FIXED 'RDFbase' ArcDocF NAME #FIXED 'assertion-set' ArcNamrA NAME #FIXED 'RDFbase-names' ArcBridF NAME #FIXED 'RDFBridge'> <!ENTITY % property-value 'title | creator | language | property' > <!ENTITY % ablock SYSTEM 'ablock.dtd' >%ablock; <!element title (#PCDATA) > <!element creator (#PCDATA) > <!element language (#PCDATA) > <!attlist (%property-value;) domain NAME FIXED 'Dublin-Core' RDFBase NAME #FIXED 'property'> <!-- End of meta-DTD -->
This architecture could be used like so:
<?IS10744 ArcBase Dublin-Core ?> <ablock> <title>The Taxonomy of Pumpkins</title> <creator>Ora Lassila</creator> <language>FIN</language> </ablock>
Note that there is no need to explicitly name the property domain because the Dublin Core architecture is the property domain and the architectural meta-DTD associates the domain with the properties. In this example, as we're presumably not looking at a complete document, it's not clear if there would be, at a minimum, a declaration of the Dublin Core as an architecture using a notation declaration. The example as shown above assumes that the name 'Dublin-Core' is well enough known to obviate the need for the more complete declarations in most use scenarios (remembering that whole purpose of architectures is to express agreements and conventions among the members of a community of interest such that outside observers can distinguish one set of conventions from another set--within the scope of the architecture's community of interest, the convention may be sufficiently ubiquitous so as not to require a more formal declaration of its use--the prime example of such an architecture is HTML).
The second Dublin Core example adds two new wrinkles: qualifying attributes for properties and domains referenced directly from a property specification. Both of these are provided for in the RDF architecture provided here. Here is the original example 5.2.2:
<ablock> <namespace href='http://www.oclc.org:5046/dublin_core/RDFschema'/> <title lang='FIN'>Kurpitsojen ja URLien alkeet</title> <creator>Ora Lassila</creator> <language>FIN</language> <subject namespace='http://purl.org/Schemas/LCSH'> Color and Color Palettes</subject> </ablock>
Referencing property domains from properties is provided by the
domain attribute of the property element form, which is a reference
to a domain name declared as either an architecture that is also a property
domain, a property domain notation, or using the RDFnamespace PI.
Providing qualifying attributes for properties would be done as part of the declaration of an architecture derived from the RDFbase architecture. These qualifiers are only relevant to processing that understands the particular property domain and are unique to the semantics of the interpretation of a particular property, so there's no reason to codify them at the RDFBase level (for example, any qualifiers could be just as easily defined as part of the property value itself). In other words, we simply use whatever qualifying attributes we need.
Duplicating the name-space reference on the
element merely requires providing a declaration of the name space replacing
the direct URL reference with a reference to the name-space name:
<ablock> <?RDFnamespace domain='Dublin-Core' href='http://www.oclc.org:5046/dublin_core/RDFschema' ?> <?RDFnamespace domain='LCSH' href='http://purl.org/Schemas/LCSH' ?> <title lang='FIN'>Kurpitsojen ja URLien alkeet</title> <creator>Ora Lassila</creator> <language>FIN</language> <subject domain='http://purl.org/Schemas/LCSH'> Color and Color Palettes</subject> </ablock>
lang attribute of the title property need simply
be declared as part of the definition of the Dublin Core architecture, which
rerequires the following new declarations for the
element form (modifying the meta-DTD shown in the previous example):
<!-- Modifications to Dublin Core architecture given above --> <!element title (#PCDATA) > <!attlist title lang -- Name of natural language used for the title property value. Should be a 3-character ISO language code. -- CDATA #IMPLIED -- Default: determined by context -- domain NAME FIXED 'Dublin-Core' RDFBase NAME #FIXED 'property'>
This change does nothing more than add the
attribute and describe its use.
The third Dublin Core example combines properties from two different property domains such that properties from one are sub-properties of another. In the example as given, prefixes are used to distinguish names in one domain from names in another. This is not strictly necessary with the architectural approach as in this case the base property names don't conflict (although prefixes still might be useful to make what's going in the instance clearer to observers). The distinction between the property sets is made through the architectural mappings of the different property-form elements. In particular, with respect to the Dublin Core architecture, the W3C properties will be captured as architectural bridging elements, which maintains the original element boundaries, but doesn't express any special semantics.
To do this, we need to add an architectural bridging form to the Dublin Core architecture. This would look something like this:
<!ENTITY % property-value 'title | creator | language | property | D-C.Bridge' >
The previous declaration adds the element form
D-C.Bridge to the allowed content of property-form elements. The next declarations
D-C.Bridge element form. Note that it is
derived from the
RDFBridge element, which means that
the bridging would carry through to RDF-specific processing as well as for
Dublin Core-specific processing.
<!element D-C.Bridge -- Dublin Core architectural bridging element -- ANY > <!attlist D-C.Bridge RDFBase NAME #FIXED 'RDFBridge'>
We now need to define the W3C properties, which we can also do as an architecture (again on the assumption that these properties will benefit from a more formal declaration because of wide use). I will leave the declaration of that architecture as an exercise for the reader (it should be clear that it simply involves copying one of the declaration sets given above and changing a few names here and there).
The original example document is:
<ablock> <namespace href='http://www.oclc.org:5046/dublin_core/RDFschema'/> <namespace href='http://www.w3.org/Library/RDFschema' as='w3c'/> <w3c::accessionId>199707301124301</w3c::accessionId> <title lang='FIN'>Kurpitsojen ja URLien alkeet</title> <creator> <ablock> <w3c::authorGivenName>Ora</w3c::authorGivenName> <w3c::authorSurname>Lassila</w3c::authorSurname> </ablock> </creator> <language>FIN</language> <subject namespace='http://purl.org/Schemas/LCSH'> Color and Color Palettes</subject> </ablock>
The architecturual version of this document that uses explicit element declarations (not shown) is shown below. In addition to the element declarations, the external DOCTYPE subset includes these namespace declarations:
<?RDFnamespace domain='Dublin-Core' href='http://www.oclc.org:5046/dublin_core/RDFschema' ?> <?RDFnamespace domain='w3c-library' href='http://www.w3.org/Library/RDFschema' prefix='w3c' ?> <?RDFnamespace domain='LCSH' href='http://purl.org/Schemas/LCSH' ?>
The document is:
<?XML 1.0 ?> <!DOCTYPE Ablock SYSTEM 'my-ablock.dtd' > <ablock> <w3c::accessionId>199707301124301</w3c::accessionId> <title lang='FIN'>Kurpitsojen ja URLien alkeet</title> <creator> <ablock> <w3c::authorGivenName>Ora</w3c::authorGivenName> <w3c::authorSurname>Lassila</w3c::authorSurname> </ablock> </creator> <language>FIN</language> <subject domain='LCSH'> Color and Color Palettes</subject> </ablock>
The only differences in the instance are the addition of the declarations,
the movement of the namespace declarations to the external subset, and the
reworking of the
subject element to use the
More interesting is the domain-specific architectural interpretation of the above document. This can be demonstrated by creating the architectural instances for both the Dublin Core and W3C Library architectures. The Dublin Core architectural instance is:
<!DOCTYPE Ablock SYSTEM 'dublin-core.mdt' > <ablock> <property domain='w3c-library' propname='accessionId'>199707301124301</property> <title lang='FIN'>Kurpitsojen ja URLien alkeet</title> <creator> <ablock> <D-C.Bridge>Ora</D-C.Bridge> <D-C.Bridge>Lassila</D-C.Bridge> </ablock> </creator> <language>FIN</language> <property domain='LCSH' propname='subject'> Color and Color Palettes</property> </ablock>
The W3C Library architectural instance is:
<!DOCTYPE Ablock SYSTEM 'w3c-library.mdt' > <ablock> <authorGivenName>Ora</authorGivenName> <authorSurname>Lassila</authorSurname> </ablock>
Note that the W3C library architectural instance only includes those elements and data that apply directly to the W3C library schema, that is, the assertion block within the creator property--everything else is ignored for the purposes of applying W3C library-specific processing. Obviously, a system processing the entire document would combine the Dublin Core processing and W3C library processing in some way depending on the type of processing involved (whether that is forking the properties to different indexes, using different presentation style sheets, or what have you).
This is the most complex example so far and demonstrates the ability of architectures to formally define the relationships between different property domains that are combined in documents or new architectures. The machinery behind the covers is involved, but serves to simplify the instances. Everything that was done above with explicit declarations could have been done in the instance (that is, defining the name spaces and the mappings to the three architectures involved).
Note also that because domain-specific prefixes have been used in this case, an observer has some chance to make sense of the document without reference to the underlying (and hidden until asked for) architectural machinery. If they recognize 'ablock' as being from the RDF mechanism, they can guess that the contained elements are properties and go from there and probably be correct. It's only when the instance deviates signifantly from the expected (but not required) structures that the architectural machinery is required in order to determine how to correctly interpret the document.
I am avoiding the manifests example because the RDF spec makes it clear that the idea is not fully cooked. In any case, the requirements raised by the manifests example are distinct from the property name domain requirements and their satisfaction would probably not rely on architectural mechanisms in any case. It's an interesting problem but outside the scope of this paper.
At this point, it should either be clear how the MARC examples can be reworked to use architectures or not. If it's not, providing the example won't help. If there is interest in a fullly worked out architectural version of the MARC example, I will be happy to provide it.
This paper should have demonstrated the utility of the AFDR architecture mechanism to satisfy the requirements of the RDF specification for representation of RDF-conforming metadata as XML and SGML documents.
I am in the process of preparing working processes for the examples provided here, including extracting RDF-defined metadata, formatting metadata using a style sheet defined in terms of the RDF architecture, and other processing tasks.
Generated from the original SGML using the JADE DSSSL engine. Style sheet developed by the author.