Using the RDF Data Model as an SGML Architecture

Author: W. Eliot Kimber

Describes the author's attempt to use the Resource Description Framework (RDF) data model as an SGML document architecture conforming to the Architectural Forms Declaration Requirments annex of ISO/IEC 10744:1997 (HyTime Second Edition). Includes an explanation of RDF base architecture as well as examples of using that architecture derived from the examples used in the RDF working draft.

Copyright (c) 1997 W. Eliot Kimber and ISOGEN International Corp.


References

  1. Resource Description Framework (RDF) Model and Syntax. URL: http://www.w3.org/PICS/Member/NG/WD-rdf-syntax-970801.html
  2. ISO/IEC 10744:1997 Hypermedia/Time-Based Structuring Language, 'Architectural Forms Declaration Requirments', Annex A.3. URL: http://www.drmacro.com/hythtml/clause-A.3.html

1. The RDF as an SGML Architecture

The Resource Description Framework is formally defined as a set of XML element types. As such, it can be easily adapted for use as an SGML architecture. SGML architectures differ from normal document types in that they define a set of meta element types that are used as templates for element types in documents (or other architectures). They are in all respects like normal document types except that they are used by reference rather than being part of the document syntactically.

The Architectural Forms Definition Requirements (AFDR) annex of ISO/IEC 10744:1997 defines the formal mechanism for declaring and using architectures. The AFDR mechanism includes a formal, machine-processible mechanism for mapping from elements in documents to element 'forms' declared for architectures. Creating such a mapping serves to formally derive some or all of the elements in a document from the forms in the architecture. This formal derivation enables processing of elements based on the meta-element types they are derived from. One of the key features of architectural mapping is that anything that is not mapped to an architecture is ignored for the purposes of processing with respect to that architecture. This makes it easier to intermingle elements derived from different architectures with fewer conflicts among the requirements imposed by the architectures.

Architectures enable the definition of a set of semantics while imposing on the documents that invoke those semantics only those constraints needed to meet the requirements of the architecture. In particular, architectures never impose constraints on element type names and attribute names, and need not impose constraints on order of occurrence. Archtectures never impose constraints on element type names and attribute names because the architectural mapping from document types to architectural types is indirect, allowing any document element type to be mapped to any architectural form.

The architectural mapping mechanism includes an automatic mapping mechanism that can significantly simplify the task of defining mappings. This mechanism is explained in more detail below.

The RDF defines a set of element types with clear semantics intended to be specialized for domain-specific use. Thus the RDF is already an architecture conceptually and is therefore a natural candidate for use as a formal SGML document architecture.

The Process of Making RDF Into an Architecture

Making the RDF into an architecture required three things:

  1. Creating a set of element and attribute declarations that reflect the abstract object model defined by the RDF spec
  2. Defining a unique name for the architecture so it can be referred to by the documents derived from it (e.g., an SGML public identifier or Internet URN).
  3. Defining a few RDF-specific processing actions to implement the property name space prefix rules defined in the RDF spec.

As the RDF spec doesn't use the SGML declaration syntax (instead, it uses productions that show the actual XML document syntax), I had to create the declarations myself. However, it was not hard to infer the declarations from the RDF spec's productions.

You give a name to an architecture by declaring it as a notation and giving the notation a public identifier (or URN). I created an SGML public ID for the RDF architecture and used the URL of the RDF spec as a substitute URN (URLs are not URNs) by using it as the system identifier for the notation. The system identifiers for notations are expected to get you to the documentation for the notation, so using the URL of the RDF spec is exactly the right thing to do.

The resulting declaration set, the RDF Base Architecture meta-DTD, can be found at http://www.isogen.com/demos/RDF/rdfbase.mdt. This declaration serves as the complete formal declaration of the RDF architecture and serves first and foremost as documentation. It can also be used by tools such as the SP parser to provide generalized architectural processing. In addition, by putting more declarations into the DOCTYPE declaration, the instance syntax can be significantly reduced. Sometimes it's useful to reduce instance syntax (such as when editing) and sometimes it's not. There's no free lunch, so some kind of mapping has to go somewhere. The architectural mechanism lets you choose whether to concentrate the mapping in the declarations or avoid declarations and put the mapping in the instance. The two forms are functionally equivalent and one can be transformed to the other without loss. For example, a document with declarations can be transformed into a document without by making explicit in the instance attributes that are fixed in the declarations (which you might want to do for transmission, especially when the document instance is smaller than the declarations, even with the added attributes).

RDF-aware processors need never process these declarations as they presumably have the rules expressed in part by these declarations built into them. Assuming that popular Web browsers and servers are such servers, these programs would never need access to these declarations. However, full SGML systems that provide architectural processing features may need them or, as in the case of SP, require them.

Thus, within an XML use context, the machinery needed to use and invoke the architecture can be kept to a minimum because the processors that provide RDF-related functions can either use defined defaults or make reasonable assumptions about how documents relate to the RDF architcture. These defaulting mechanisms are discussed in some detail below.

In the examples, I have shown the full (or almost full) syntax versions of the different cases along with the minimal forms in order to both make as clear as possible the relationship between the minimal forms to the formal mechanisms and to provide examples that can be easily used with the SP and Jade tools, which support both formal architectural processing and parsing of XML documents.

In everyday use, only the minimized forms need to be used and RDF-aware processors need not provide generalized architectural processing facilities (although providing them is not particularly difficult once one has a generalized XML parser).

Syntax Differences between the RDF spec and the RDF architecture

The RDF specification was not designed to be an architecture, nor was it designed as a full SGML (or XML) document type. Thus I had to make a few minor syntax changes from the RDF spec to make it work as an architecture. I did not intentionally change any of the semantics of the RDF objects reflected by the syntax defined in the RDF spec. (In the discussion that follows, I use "RDF spec" to refer to the original RDF specification and "RDF architecture" to refer to the architecture defined in this document.)

The RDF architecture meta-DTD is defined using full SGML syntax, as are all the derived meta-DTDs used in the examples that follow. This is because the formal architectural mechanism requires the use of SGML facilities XML doesn't support, especially data attributes. However, as these declarations are primarily design documents, it doesn't matter that they can't be used directly for XML documents. Equivalent declaration sets can be created that do conform to the XML constraints, if necessary. I have essentially made the assumption, for the purposes of this paper, that any generalized architectural processing will be done using full SGML tools (e.g., SP), and not XML-only processors.

The syntax differences between the RDF specification and the RDF architectural meta-DTD are:

RDF-Specific Prefix Processing

The RDF mechanism essentially provides a way to encode property-value pairs where the property names are expected to be defined in some defined property name space ("property domain"). The RDF model allows property names to have prefixes that help distinguish property names from different property domains. It defines the notion of a 'name space prefix' and defines rules for both associating prefixes with properties and the rules for distinquishing prefixes from property names (e.g., using a '::' separator). Note that with this mechanism, the actual characters used for the prefix separator could be anything (e.g. "..", "-", or whatever). The double colon convention has some attraction but its use is not necessary for the prefix mechanism to work.

This prefix mechanism is not strictly necessary to enable architectural processing. In addition, the RDF architecture makes the property name an attribute of the generic property element. However, it is convenient to be able to infer property names from element type names in the absence of explicit element declarations (if there are explicit declarations, the propname attribute value can be set by the declaration irrespective of the element's element type name). In addition, it is likely that different domains will define property names that have the same name but different semantics or expected content. Such names cannot be used as element type names in the same document and apply to both domains. One way to remove the name clash is to add prefixes to the element type names reflecting the domain from which the properties come. The RDF prefix mechanism both provides guidance for choosing prefixes and makes it possible to infer the real (unqualified) property name from element type names that include prefixes.

The RDF architecture makes the propname an implied attribute in the architecture. That means that it's up to the processor to infer the value of the propname attribute if it's not specified. In this case, the processor will be an RDF-aware processor. Thus the RDF architecture defines the algorithm for determining the value of propname attributes by examining the element types of elements derived from the property element form as modified by the domain prefix specified for the property name space. This type of inference mechanism is no different in kind from any other application-specific semantic for determining the values of implied attributes.

The RDF architecture provides three ways to define the prefix associated with a particular name space:

I've provided three ways to do the same thing because there are different ways to declare and configure the use of an architecture. As this mechanism is intended to be used with XML documents where there may be no declarations at all and where we cannot use data attributes in any case, we need the PI. The data attribute approach is most convenient when using the full architectural declaration mechanism, where all the properties of an RDF name space can be defined in one place (on the notation declaration). I would expect the notation approach to be used primarily in centralized DTD declaration sets or architectures.

The use of a PI to set the prefix is appropriate (or rather, is not inappropriate) because it relates to an RDF-specific semantic for attribute value defaults, not to the parsing of the document. It satisfies the requirement of disambiguating properties from different semantic domains used in the same document.

2. Using the RDF Architecture

An SGML document architecture is intended as a base or template from which specialized document types or architectures are derived. If one architecture is derived from another, it forms a hierarchy of architectures, from the most general at the bottom, to the most specialized at the top. This is conceptually similar to creating object hierarchies in object-oriented programs, but is not quite the same because the architecture hierarchy is a hierarchy of semantic definitions, not working program objects. In addition, the architecture mechanism works with existing SGML syntax and parsing, which currently lacks any idea of inherentance of syntactic rules. So, for example, there is no literal inheritance of content models or attribute declarations, which you might expect to get from such a mechanism (there are other reasons why such inheritance is problematic at best, if not impractical in the general case).

The RDF data model is explicitly designed to be specialized, as evidenced by the property object type, which represents a property name/value pair. The RDF model does not define any property names and expects applications of the RDF model to define property names by declaring element types where the element type names are the application-specific property names. The RDF design as written provides a minor problem because, unlike all the other object types in the RDF model, the property object does not have a fixed element type associated with it. However, to create an architecture for the RDF spec, there must be a defined element type for elements that are properties to map to. Thus the RDF architecture has an element form called property. To capture the name of the property, the property element has an attribute, propname, that is set to the value of the property name used in the specialization. The content of the property element is the property value.

Using the RDF Architecture directly

To use the RDF architecture with your documents, you create element types derived from the element forms in the RDF architecture. The easiest way to do this is to use the RDF architectural forms directly as element types, e.g.:

<?XML 1.0?>
<?IS10744 ArcBase RDFBase ?>
<MyDoc>
 <metadata>
  <title>My Document</title>
  <ablock>
   <property propname='subject'>me</property>
  </ablock>
 </metadata>
 ...
</MyDoc>

This approach works fine--the mapping from the elements in MyDoc to the RDF elements is direct and obvious, taking advantage of the automatic architectural mapping mechanism that automatically maps elements to architectural forms of the same name. However, it is unsatisfying for two reasons:

Thus, it's more likely that the RDF architecture will be used indirectly, by using specialized elements derived from the RDF forms. In your documents, you indicate the derivation by doing two things:

Thus, the previous example could be reworked to something like this:

<?XML 1.0?>
<?IS10744 ArcBase RDFBase ?>
<MyDoc>
<metadata>
 <title>My Document</title>
 <ablock>
  <subject RDFBase='property'>me</property>
 </ablock
</metadata>
...
</MyDoc>

The processing instruction <?IS10744 ArcBase RDFBase > is a declaration that says that the name 'RDFBase' is the name of a base architecture. This is the minimum declaration necessary to formally indicate the use of an architecture. There are additional declarations that can be used if necessary. These are discussed later. The keyword 'IS10744' refers to International Standard ISO/IEC 10744:1997, the standard that defines the meaning of this processing instruction. The keyword 'ArcBase' indicates that the PI is listing the names of base architectures from which this document is derived. The name 'RDFBase' is the name of the architecture. This name is used as the name for the architectural form naming attribute if you don't explicitly declare a different one.

Now you have an element called subject that is derived from the RDF architectural form property, as indicated by the value 'property' of the RDFBase attribute.

There are no RDFBase attributes on the ablock and MyDoc elements because their mapping to the RDF element forms is automatic by application of the automatic mapping rules for architectures. By default, the document element of the base document is automatically mapped to the document element of the architecture (for RDF, the assertion-set element form). If an element has the same name as a form in the architecture, it is also automatically mapped to that form. Thus the ablock element in the document is automatically mapped to the ablock form in the RDF architecture.

The previous examples put the architecture mapping attributes on the element instances, which you have to do if you have documents without explicit DTD declarations. However, if you do have explicit DTD declarations, you can use fixed attributes or attributes with explicit default values to define the mapping, simplifying the instance syntax. For example, the document above could use these declarations:

<?XML 1.0?>
<!DOCTYPE MyDoc [
  <?IS10744 ArcBase RDFBase ?>
 <!ELEMENT MyDoc  (Metadata, Content)>
 <!ELEMENT Metadata (title, ablock)>
 <!ELEMENT Title  (#PCDATA)>
 <!ELEMENT Ablock (subject)>
 <!ELEMENT Subject (#PCDATA) ><!-- The subject of the document -->
 <!ATTLIST Subject
    RDFBase NAME #FIXED 'property'>
 ...
]>
<MyDoc>
<metadata
 <title>My Document</title>
 <ablock>
  <subject>me</subject>
 </ablock>
</metadata>
...
</MyDoc>

Here, the instance has been simplified. In this example, the instance is smaller than the declarations, but in a real environment, you might have many metadata properties and the declarations might be reused by many documents, making the cost of creating the declarations worth the effort.

Deriving Domain-Specific RDF Architectures

If you are using the RDF architecture in an ad-hoc manner to create whatever properties you happen to think of at the moment, then using the RDF architecture directly as a base architecture works fine. However, there are many domains that define specific metadata properties, such as the PICS specification, MARC records for cataloging documents in libraries, not to mention enterprise-specific metadata related to specific business processes. Any domain-specific set of properties can be expressed as a new SGML document architecture derived from the RDF architecture. Documents can then use these specialized RDF architectures directly (just as they could use the RDF architecture) or they can further specialize (if only to avoid name conflicts).

The purpose in deriving a new architecture derived from the RDF architecture is primarily to capture as a separate specification (and related meta-DTD) the rules for RDF properties in a specific domain. This separates the metadata design from the design of documents or document types that might use it and opens the way for additional specialization as needed without the need to change the original architecture. The RDF object model is simple enough that it is unlikely that any requirements will be found in the future requiring drastic change to the design itself.

To derive a new RDF architecture you do just what you would do to create a document with explicit DTD declarations. However, instead of using the declarations for documents, you use them indirectly as an architectural meta-DTD just as the RDF architecture meta-DTD is used above. Because architectural meta-DTDs are intended to be used by reference and because they act as part of the documentation for an architecture (as well as enabling machine validation of conformance to the architecture), it is important to make them a bit more formal than the minimum required in documents. In particular, you should both declare any notations for any architectures from which the architecture is derived, as well as provide a template for the notation declaration to be used for the architecture itself. As architectural meta-DTDs are usually only used by reference, these added declarations will rarely, if ever, be seen by anybody who doesn't need to see them.

Thus, to create a domain-specific metadata architecture derived from the RDF spec, you would do something like this:

<!-- Meta-DTD for my metadata architecture.  This architecture is
     derived from the RDF architecture.
     Use the following declarations to declare the use of this
     architecture:

     <?IS10744 ArcBase MyMetadata ?>
    <!NOTATION MyMetadata
       SYSTEM '-//ME//NOTATION My Metadata Architecture//EN'>
    <?RDFnamespace domain='MyMetadata' href='http://www.me.com/mymetadata.html'>
-->
<?IS10744 ArcBase RDFBase >
<?IS10744 ArcBase RDFBase ?>
<!NOTATION RDFBase
  PUBLIC '-//W3C::RDF//NOTATION
          Resource Description Framework (RDF)Model and Syntax//EN'
         'http://www.w3.org/Member/9708/WD-rdf-syntax-970801.html'>

<!-- NOTE: assertion-set, ablock, copied directly from RDF meta-DTD. -->
<!element assertion-set -- A set of RDF assertions --
  - - (ablock*)
>
<!element ablock -- Assertion block --
  - - (property*)
>
<!attlist ablock
  href    -- Resource URI (resource to which this assertion applies) --
    CDATA #IMPLIED -- Default: ablock applies to containing resource --
  ID      -- Unique identifier for the assertion block --
    ID    #IMPLIED
  reftype -- Resource named is another assertion block.  This assertion
             block should apply to whatever resource that assertion
             block identifies. --
    (indirect) #IMPLIED -- Default: not indirect. --
>
<!element property -- Property assertion specification --
  - - (%property-value;)*
>
<!attlist subject -- The subject of the document --
  propname -- The name of the property --
           -- NOTE: The value of this attribute is normally
              the GI of the element in the derived RDF
              architecture. --
    NAME #FIXED 'subject'
  href    -- URI of property value --
   CDATA #CONREF
  domain NAME #FIXED 'MyMetadata'>
<!-- End of meta-DTD -->

A document would use this architecture just as you would use the RDF architecture:

<?XML 1.0?>
<?IS10744 ArcBase MyMetadata ?>
<MyDoc>
<metadata>
 <title>My Document</title>
 <ablock>
  <subject>me</subject>
 </ablock>
</metadata>
...
</MyDoc>

Note the two differences between this example and the nearly identical one above:

Note one interesting effect of declaring this specialized architecture: the instance syntax is further simplified without the need to have its own declarations to set the values of the architectural mapping attributes (because we've taken full advantage of the automatic architectural mapping rules).

Note also that the declaration set for MyDoc shown above could also have been used as an architectural meta-DTD, rather than as the DOCTYPE declaration for the instance. This would provide the same instance simplification. The document could be validated against the declarations with the same result in either case (because the declarations and the document are the same in both cases except that in the latter case, the declarations are not syntactically part of the document, but are used by reference).

3. Duplicating the RDF Use Examples with the RDF Architecture

All the examples in the RDF specification can be translated to use the architectural mechanism with a minimum of change. The only difficulty in doing this translation is, in some cases, figuring out whether a domain-specific architecture is warranted. I tried to determine this from the information in the examples, but I may have misunderstood the examples and therefore made architectures where none were needed or the reverse.

To simplify the examples, I have defined the following external parameter entity containing declarations for the assertion-set, ablock, and property element types to simplify the creation of derived architectures and documents with explicit DTD declarations:

<!-- Re-usable declarations for assertion-set and ablock element types -->
<!-- Refer to as 'ablock.dtd' -->
<!ENTITY % ablock-name 'ablock' -- Change this if you change name of ablock element -->
<!ENTITY % ablock-content 'namespace*, property*' >
<!ELEMENT assertion-set (%ablock-name;)* >
<!ELEMENT %ablock-name; (%ablock-content;) >
<!ENTITY % property-value '#PCDATA | ablock' >
<!ELEMENT property (%property-value;) >
<!ATTLIST property
   propname  CDATA #IMPLIED
   domain    NAME  #IMPLIED
   href    -- URI of property value --
     CDATA #CONREF
>

PICS Example 1

The original PICS example (5.1.1) is:

<ablock>
 <namespace href='http://www.gcf.org/v2.5' as='gcf'/>
 <gcf::suds>0.5</gcf::suds>
 <gcf::density>0</gcf::density>
 <gcf::color>1</gcf::color>
 </ablock>

Architectural version without architectural notation declaration:

<?IS10744 ArcBase RDFBase ?>
 <ablock>
 <?RDFnamespace domain='gcf' href='http://www.gcf.org/v2.5' prefix='gcf'>
 <gcf::suds RDFbase='property'>0.5</gcf::suds>
 <gcf::density RDFbase='property'>0</gcf::density>
 <gcf::color RDFbase='property'>1</gcf::color>
 </ablock>

Architectural version with architectural notation declaration (directly derived from RDFBase), using RDFnamespace PI to define property domain (name space):


<?XML 1.0 ?>
<!DOCTYPE ablock [
 <?IS10744 ArcBase RDFBase ?>
 <!NOTATION RDFBase
    PUBLIC '-//W3C::RDF//NOTATION
            Resource Description Framework (RDF)Model and Syntax//EN'
           'http://www.w3.org/Member/9708/WD-rdf-syntax-970801.html'>
 <?RDFnamespace domain='gcf' href='http://www.gcf.org/v2.5' prefix='gcf'>
]>
 <ablock>
 <gcf::suds RDFbase='property'>0.5</gcf::suds>
 <gcf::density RDFbase='property'>0</gcf::density>
 <gcf::color RDFbase='property'>1</gcf::color>
 </ablock>

Full SGML architectural version with all declarations and using notation to declare property domain:

<!DOCTYPE ablock [
 <?IS10744 ArcBase RDFBase ?>
 <!NOTATION RDFBase
    PUBLIC '-//W3C::RDF//NOTATION
            Resource Description Framework (RDF)Model and Syntax//EN'
           'http://www.w3.org/Member/9708/WD-rdf-syntax-970801.html'
 <!ATTLIST #NOTATION RDFBase
       ArcFormA NAME  #FIXED 'RDFbase'
       ArcDTD   CDATA #FIXED 'RDFbase.meta-DTD'
       ArcDocF  NAME  #FIXED 'assertion-set'
       ArcNamrA NAME  #FIXED 'RDFbase-names'
       ArcBridF NAME  #FIXED 'RDFBridge'
 >
 <!NOTATION GCF SYSTEM 'http://www.gcf.org/v2.5'>
 <!ATTLIST #NOTATION GCF
     prefix  NAME #FIXED 'GCF'
     RDFBase NAME #FIXED 'RDFnamespace'>
 <!ENTITY % property-value 'gcf::suds | gcf::density | gcf::color'>
 <!ENTITY % ablock SYSTEM 'ablock.dtd'>
 %ablock;
 <!ELEMENT gcf::suds - - (#PCDATA) >
 <!ATTLIST gcf::suds
   propname NAME #FIXED "suds"
   RDFBase  NAME #FIXED "property" 
 >
 <!ELEMENT gcf::density - - (#PCDATA) >
 <!ATTLIST gcf::density
   propname NAME #FIXED "density"
   RDFBase  NAME #FIXED "property" 
 >
 <!ELEMENT gcf::color - - (#PCDATA) >
 <!ATTLIST gcf::color
   propname NAME #FIXED "color"
   RDFBase  NAME #FIXED "property" 
 >

]>
 <ablock>
 <gcf::suds>0.5</gcf::suds>
 <gcf::density>0</gcf::density>
 <gcf::color>1</gcf::color>
 </ablock>

PICS Example 2

The second PICS example (5.1.3) adds a second assertion block and introduces additional property types and a second name space. As the additional property types are defined by the PICS schema, I assumed that the PICS schema would be best defined as an architecture. Here is the PICS architectural meta-DTD (reflecting the properties used in the example):


<!-- Declarations for PICS meta-DTD.  Derived from RDFBase architecture -->
<!-- Refer to this architecture as 'PICS' -->
<!-- This notation is also a property domain name space.  -->
<?IS10744 ArcBase RDFBase >
<!NOTATION RDFBase
 PUBLIC '-//W3C::RDF//NOTATION Resource Description Framework (RDF)Model and Syntax//EN'
         'http://www.w3.org/Member/9708/WD-rdf-syntax-970801.html' >
<!ATTLIST #NOTATION
       RDFBase
       ArcFormA NAME  #FIXED 'RDFbase'
       ArcDocF  NAME  #FIXED 'assertion-set'
       ArcNamrA NAME  #FIXED 'RDFbase-names'
       ArcBridF NAME  #FIXED 'RDFBridge'
>
<!ENTITY % property-value 'by | on | until' >
<!ENTITY % ablock SYSTEM 'ablock.dtd' >
%ablock;
<!element by (#PCDATA) >
<!element on (#PCDATA) >
<!element until (#PCDATA) >
<!attlist (by | on | until)
   domain  NAME #FIXED 'pics'
   RDFBase NAME #FIXED 'property'>
<!-- End of meta-DTD -->

The document from example 5.1.3 would now look like this:

<?XML 1.0 ?>
<!DOCTYPE Document [
 <?IS10744 ArcBase RDFBase PICS ?>
<!NOTATION PICS
    SYSTEM 'http://www.gcf.org/v2.5'
>
<?RDFnamespace domain='pics' href='http://www.gcf.org/v2.5' prefix='pics'?>
<!ATTLIST pics::by
    pics NAME #FIXED 'by'>
<!ATTLIST pics::on
    pics NAME #FIXED 'on'>
<!ATTLIST pics::until
    pics NAME #FIXED 'until' >
]>
<Document>
...
<ablock href='http://www.w3.org/PICS/Overview.html' id='block1'>
 <?RDFnamespace domain='gcf' href='http://www.gcf.org/v2.5' ?>
 <suds RDFbase='property'>0.5</suds>
 <density RDFbase='property'>0</density>
 <color RDFbase='property'>1</color>
 </ablock>
<ablock href='#block1'>
 <pics::by>John Doe</pics::by>
 <pics::on>1994.11.05T08:15-0500</pics::on>
 <pics::until>1995.12.31T23:59-0000</pics::until>
 </ablock>
...
</Document>

There are two key changes from the RDF spec example:

PICS Example 3 (5.1.4)

The original example from the RDF spec is:

<ablock href='http://w3.org/PICS/Overview.html'>
 <namespace href='http://www.gcf.org/v2.5'/>
 <suds>0.5</suds>
 <density>0</density>
 <color>
   <ablock>
     <hue>1</hue>
     <lightness>45</lightness>
     <saturation>70</saturation>
     </ablock>
   </color
 </ablock>

The architecture version of this example would not be materially different from any of the above, as the only addition is a nested ablock, which has no architecture use implications in this case (because the nested properties are from the same property domain as the property that contains them).

Dublin Core Example 1 (5.2.1)

The first Dublin Core example simply provides a new property domain, the one defined by the Dublin core. Like any other property domain, the domain can be declared as an architecture derived from the RDF, as a separate notation, or using the RDFnamespace PI. Because the Dublin Core is intended to be a widely-used property domain, it probably makes the most sense to define it as an architecture, as much for its documentary benefit as for its processing utility.

The original example is:

<ablock>
 <namespace href='http://www.oclc.org:5046/dublin_core/RDFschema'/>
 <title>The Taxonomy of Pumpkins</title>
 <creator>Ora Lassila</creator>
 <language>FIN</language>
 </ablock>

From this we can define an architectural meta-DTD along these lines:


<!-- Declarations for Dublin Core meta-DTD.  Derived from RDFBase architecture -->
<!-- Refer to this architecture as 'Dublin-Core' with the URL
     'http://www.oclc.org:5046/dublin_core/RDFschema'  -->
<!-- This notation is also a property domain name space.  -->
<?IS10744 ArcBase RDFBase >
<!NOTATION RDFBase
 PUBLIC '-//W3C::RDF//NOTATION Resource Description Framework (RDF)Model and Syntax//EN'
         'http://www.w3.org/Member/9708/WD-rdf-syntax-970801.html' >
<!ATTLIST #NOTATION
       RDFBase
       ArcFormA NAME  #FIXED 'RDFbase'
       ArcDocF  NAME  #FIXED 'assertion-set'
       ArcNamrA NAME  #FIXED 'RDFbase-names'
       ArcBridF NAME  #FIXED 'RDFBridge'>
<!ENTITY % property-value 'title | creator | language | property' >
<!ENTITY % ablock SYSTEM 'ablock.dtd' >%ablock;
<!element title (#PCDATA) >
<!element creator (#PCDATA) >
<!element language (#PCDATA) >
<!attlist (%property-value;)
   domain  NAME FIXED 'Dublin-Core'
   RDFBase NAME #FIXED 'property'>
<!-- End of meta-DTD -->

This architecture could be used like so:


<?IS10744 ArcBase Dublin-Core ?>
<ablock>
 <title>The Taxonomy of Pumpkins</title>
 <creator>Ora Lassila</creator>
 <language>FIN</language>
 </ablock>

Note that there is no need to explicitly name the property domain because the Dublin Core architecture is the property domain and the architectural meta-DTD associates the domain with the properties. In this example, as we're presumably not looking at a complete document, it's not clear if there would be, at a minimum, a declaration of the Dublin Core as an architecture using a notation declaration. The example as shown above assumes that the name 'Dublin-Core' is well enough known to obviate the need for the more complete declarations in most use scenarios (remembering that whole purpose of architectures is to express agreements and conventions among the members of a community of interest such that outside observers can distinguish one set of conventions from another set--within the scope of the architecture's community of interest, the convention may be sufficiently ubiquitous so as not to require a more formal declaration of its use--the prime example of such an architecture is HTML).

Dublin Core Example 2 (5.2.2)

The second Dublin Core example adds two new wrinkles: qualifying attributes for properties and domains referenced directly from a property specification. Both of these are provided for in the RDF architecture provided here. Here is the original example 5.2.2:

<ablock>
 <namespace href='http://www.oclc.org:5046/dublin_core/RDFschema'/>
 <title lang='FIN'>Kurpitsojen ja URLien alkeet</title>
 <creator>Ora Lassila</creator>
 <language>FIN</language>
 <subject namespace='http://purl.org/Schemas/LCSH'>
      Color and Color Palettes</subject>
 </ablock>

Referencing property domains from properties is provided by the domain attribute of the property element form, which is a reference to a domain name declared as either an architecture that is also a property domain, a property domain notation, or using the RDFnamespace PI.

Providing qualifying attributes for properties would be done as part of the declaration of an architecture derived from the RDFbase architecture. These qualifiers are only relevant to processing that understands the particular property domain and are unique to the semantics of the interpretation of a particular property, so there's no reason to codify them at the RDFBase level (for example, any qualifiers could be just as easily defined as part of the property value itself). In other words, we simply use whatever qualifying attributes we need.

Duplicating the name-space reference on the subject element merely requires providing a declaration of the name space replacing the direct URL reference with a reference to the name-space name:

<ablock>
 <?RDFnamespace domain='Dublin-Core'
                 href='http://www.oclc.org:5046/dublin_core/RDFschema' ?>
 <?RDFnamespace domain='LCSH'
                 href='http://purl.org/Schemas/LCSH' ?>
 <title lang='FIN'>Kurpitsojen ja URLien alkeet</title>
 <creator>Ora Lassila</creator>
 <language>FIN</language>
 <subject domain='http://purl.org/Schemas/LCSH'>
      Color and Color Palettes</subject>
 </ablock>

The lang attribute of the title property need simply be declared as part of the definition of the Dublin Core architecture, which rerequires the following new declarations for the title element form (modifying the meta-DTD shown in the previous example):

<!-- Modifications to Dublin Core architecture given above -->
<!element title (#PCDATA) >
<!attlist title
   lang    -- Name of natural language used for the title property value.
              Should be a 3-character ISO language code. --
     CDATA  #IMPLIED -- Default: determined by context --
   domain  NAME FIXED 'Dublin-Core'
   RDFBase NAME #FIXED 'property'>

This change does nothing more than add the lang attribute and describe its use.

Dublin Core Example 3 (5.2.3)

The third Dublin Core example combines properties from two different property domains such that properties from one are sub-properties of another. In the example as given, prefixes are used to distinguish names in one domain from names in another. This is not strictly necessary with the architectural approach as in this case the base property names don't conflict (although prefixes still might be useful to make what's going in the instance clearer to observers). The distinction between the property sets is made through the architectural mappings of the different property-form elements. In particular, with respect to the Dublin Core architecture, the W3C properties will be captured as architectural bridging elements, which maintains the original element boundaries, but doesn't express any special semantics.

To do this, we need to add an architectural bridging form to the Dublin Core architecture. This would look something like this:

<!ENTITY % property-value 'title | creator | language | property |
                           D-C.Bridge' >

The previous declaration adds the element form D-C.Bridge to the allowed content of property-form elements. The next declarations declare the D-C.Bridge element form. Note that it is derived from the RDFBridge element, which means that the bridging would carry through to RDF-specific processing as well as for Dublin Core-specific processing.


<!element D-C.Bridge -- Dublin Core architectural bridging element --
  ANY
>
<!attlist D-C.Bridge
   RDFBase NAME #FIXED 'RDFBridge'>

We now need to define the W3C properties, which we can also do as an architecture (again on the assumption that these properties will benefit from a more formal declaration because of wide use). I will leave the declaration of that architecture as an exercise for the reader (it should be clear that it simply involves copying one of the declaration sets given above and changing a few names here and there).

The original example document is:

<ablock>
 <namespace href='http://www.oclc.org:5046/dublin_core/RDFschema'/>
 <namespace href='http://www.w3.org/Library/RDFschema' as='w3c'/>
 <w3c::accessionId>199707301124301</w3c::accessionId>
 <title lang='FIN'>Kurpitsojen ja URLien alkeet</title>
 <creator>
   <ablock>
      <w3c::authorGivenName>Ora</w3c::authorGivenName>
      <w3c::authorSurname>Lassila</w3c::authorSurname>
      </ablock>
   </creator>
 <language>FIN</language>
 <subject namespace='http://purl.org/Schemas/LCSH'>
      Color and Color Palettes</subject>
 </ablock>

The architecturual version of this document that uses explicit element declarations (not shown) is shown below. In addition to the element declarations, the external DOCTYPE subset includes these namespace declarations:

<?RDFnamespace domain='Dublin-Core'
               href='http://www.oclc.org:5046/dublin_core/RDFschema' ?>
<?RDFnamespace domain='w3c-library'
               href='http://www.w3.org/Library/RDFschema'
               prefix='w3c' ?>
<?RDFnamespace domain='LCSH'
               href='http://purl.org/Schemas/LCSH' ?>

The document is:

<?XML 1.0 ?>
<!DOCTYPE Ablock SYSTEM 'my-ablock.dtd' >
<ablock>
 <w3c::accessionId>199707301124301</w3c::accessionId>
 <title lang='FIN'>Kurpitsojen ja URLien alkeet</title>
 <creator>
   <ablock>
      <w3c::authorGivenName>Ora</w3c::authorGivenName>
      <w3c::authorSurname>Lassila</w3c::authorSurname>
      </ablock>
   </creator>
 <language>FIN</language>
 <subject domain='LCSH'>
      Color and Color Palettes</subject>
 </ablock>

The only differences in the instance are the addition of the declarations, the movement of the namespace declarations to the external subset, and the reworking of the subject element to use the domain attribute.

More interesting is the domain-specific architectural interpretation of the above document. This can be demonstrated by creating the architectural instances for both the Dublin Core and W3C Library architectures. The Dublin Core architectural instance is:

<!DOCTYPE Ablock SYSTEM 'dublin-core.mdt' >
<ablock>
 <property domain='w3c-library' propname='accessionId'>199707301124301</property>
 <title lang='FIN'>Kurpitsojen ja URLien alkeet</title>
 <creator>
   <ablock>
      <D-C.Bridge>Ora</D-C.Bridge>
      <D-C.Bridge>Lassila</D-C.Bridge>
      </ablock>
   </creator>
 <language>FIN</language>
 <property domain='LCSH' propname='subject'>
      Color and Color Palettes</property>
 </ablock>

The W3C Library architectural instance is:

<!DOCTYPE Ablock SYSTEM 'w3c-library.mdt' >
<ablock>
 <authorGivenName>Ora</authorGivenName>
 <authorSurname>Lassila</authorSurname>
</ablock>

Note that the W3C library architectural instance only includes those elements and data that apply directly to the W3C library schema, that is, the assertion block within the creator property--everything else is ignored for the purposes of applying W3C library-specific processing. Obviously, a system processing the entire document would combine the Dublin Core processing and W3C library processing in some way depending on the type of processing involved (whether that is forking the properties to different indexes, using different presentation style sheets, or what have you).

This is the most complex example so far and demonstrates the ability of architectures to formally define the relationships between different property domains that are combined in documents or new architectures. The machinery behind the covers is involved, but serves to simplify the instances. Everything that was done above with explicit declarations could have been done in the instance (that is, defining the name spaces and the mappings to the three architectures involved).

Note also that because domain-specific prefixes have been used in this case, an observer has some chance to make sense of the document without reference to the underlying (and hidden until asked for) architectural machinery. If they recognize 'ablock' as being from the RDF mechanism, they can guess that the contained elements are properties and go from there and probably be correct. It's only when the instance deviates signifantly from the expected (but not required) structures that the architectural machinery is required in order to determine how to correctly interpret the document.

Manifests Examples

I am avoiding the manifests example because the RDF spec makes it clear that the idea is not fully cooked. In any case, the requirements raised by the manifests example are distinct from the property name domain requirements and their satisfaction would probably not rely on architectural mechanisms in any case. It's an interesting problem but outside the scope of this paper.

MARC Examples

At this point, it should either be clear how the MARC examples can be reworked to use architectures or not. If it's not, providing the example won't help. If there is interest in a fullly worked out architectural version of the MARC example, I will be happy to provide it.

Summary

This paper should have demonstrated the utility of the AFDR architecture mechanism to satisfy the requirements of the RDF specification for representation of RDF-conforming metadata as XML and SGML documents.

I am in the process of preparing working processes for the examples provided here, including extracting RDF-defined metadata, formatting metadata using a style sheet defined in terms of the RDF architecture, and other processing tasks.


Generated from the original SGML using the JADE DSSSL engine. Style sheet developed by the author.