Schema Referencing in PDS4 Labels

From The SBN Wiki
Jump to navigation Jump to search

There are several ways to tie schema documents to the XML files they define in order to validate the documents and take advantage of schema-aware editors; but in general these methods are not compatible with each other. In other words, XML editors need to pick the method they want to use, and then use it consistently. Trying to change methods generally involves changing software settings and/or editing the schema references in the XML files.

The PDS4 schema library is relatively complex and interlinked. That is, the PDS4 dictionary schemas - the ones that define the core PDS and discipline name spaces as well as the mission dictionaries - cross-reference each other. In order for any particular software environment, then, to be able to resolve all schema references reliably, it will be rather important that the same technique be used in all dictionary schemas and all label files, regardless of source. This must also be done in an environment-agnostic way, or you will have to edit schema files each time you try to run validation on a new machine, or even in a new directory in the same disk space.

This page describes how to set up your PDS4 labels to be consistent with the PDS schema library and remove environmental dependence from your schema references as far as possible. This method is very strongly recommended to PDS4 data preparers. In fact, your node consultant may insist on it in order to have consistent and reliable validation of your deliveries.


Preliminaries

PDS-controlled namespaces will almost always be defined by a pair of related schema files: an XML Schema (.xsd) file to define the class structures and general data types; and a Schematron (.sch) file to define enumerated value lists and conditional structure relationships (e.g., you must use PDS attribute A or PDS attribute B, but not both). You will need to tie your labels to both of these files. The Schematron file will be referenced in the XML prolog; the XSD file will be referenced in the document root tag (<Product_Observational>, for example).

Note: Schema File vs. Namespace

URIs (Uniform Resource Identifiers) are used to identify both namespaces and the files that define those namespaces. While it is easy, given the notational conventions described below, to conflate these two things, they are and remain very different concepts to your software. The namespace URI is a logical identifier - it refers to the concept of the dictionary, irrespective of minor version changes. That is, version 1.3 of the PDS core namespace, for example, has exactly the same URI as version 1.5 of the same namespace. (Version 2.0, though, would be a different namespace.)

The schema URIs, however, must resolve to physical files. It is the schema URIs that control the version of the namespace actually applied to the label for editing assistance and for validation.

The practical upshot for PDS4 labels is that when you are referencing a schema file, your URI will contain a file name. When you are referencing a namespace, it will not. And in order to allow for reasonable transportability, file system references will be replaced by URI references that can be resolved through an XML Catalog file.

Schematron References

Schematron references are placed in the prolog of the document following the XML declaration. Schematron files are referenced by xml-model processing instructions. (The prolog is everything before the document root tag; processing instructions are delimited by the character pairs <? and ?> -same as for the XML declaration.)

The xml-model processing instruction is the focus of a relatively new (first proposed in 2010; last revised 2012) W3C standard "Associating Schemas with XML Documents". It exists to provide an explicit link between an XML document and the schema that define(s) its valid content. PDS uses the xml-model processing instruction to associate Schematron-type schema files, specifically, with a label. (The XSD schema files are associated through namespace declarations.)

If your software (your editor, for example) has implemented the "Associating Schemas" standard, then you should use one of these two forms for xml-model in your PDS4 labels:

<?xml-model href="http://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1201.sch"?>

or:

<?xml-model href="http://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1201.sch" schematypens="http://purl.oclc.org/dsdl/schematron"?>

which adds a bit of optional information. Here's what's going on:

href="http://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1201.sch"
href is required, and to be compliant with the "Associating Schemas" standard, it must be a URI that maps to a physical file. However, because the value of href is a URI, editors that implement the XML Catalog standard along with the schema association standard should use any relevant XML catalog entries to help resolve the href reference. You should keep that in mind when formulating your XML catalog entries/
Also, the href value must be a single URI, so you will need to include one xml-model statement for every Schematron file you wish to associate with the label. Start each xml-model statement on a new line to avoid confusion and trouble down the line.
schematypens="http://purl.oclc.org/dsdl/schematron"
The optional schematypens attribute gives any software that cares to check a hint about what kind of schema it can expect to find when it resolves the href URI to a physical file. The namespace shown here is the official namespace URI for ISO Schematron - the version used in PDS4 dictionaries. In the absence of schematypens, any particular processing routine would have to try to decipher the referenced file type by something like file extension or the initial content inside the file.
Note for Eclipse Users: The Eclipse editor and its Schematron plug-in have a couple of significant limitations:
  1. The href value must be a physical file location relative to the label in the current disk space. Web references and URIs will not resolve, even with XML catalog file entries available, and absolute file references don't seem to work, either. This is a major drawback with Eclipse if you need schema references that are environment-independent.
  2. The presence of a schematypens pseudo-attribute will be flagged as an error.

There are other optional pseudo-attributes for xml-model that are unlikely, at least as of this writing, to show up in PDS4 labels, but they do at least have a format definition in the "Associating Schemas" standard. The ones you're most likely to see include:

  • type: The value should be a content-type descriptor like those you would find in an HTTP header.
  • charset: The value specifies a character set using standard abbreviations like "US-ASCII" or "UTF-8".
  • title: The value is the title of the schema document being referenced by href.