HHG to XML Schema Definition Language in PDS4

From The SBN Wiki
Revision as of 20:04, 3 August 2017 by Akash (talk | contribs) (categories)
Jump to navigation Jump to search

As explained in the HHG to the eXtensible Markup Language in PDS4, XML is a syntax standard. It defines how to identify markup tags, but it does not define what the tag names are or how they are to be interpreted. In order to do that, an organization like PDS needs to define tag names and their significance for processing. There are a number of ways to do this.

Document Type Definitions were popular for a long while. They could be embedded into the document or provided in a separate file. But DTDs were not the best fit for PDS, where we have fairly elaborate constraints we wantrom validators to enforce in all labels.

Schema languages are another approach to this tag-definitions problem. PDS decided to use the XML Schema Definition Language" (XSD) for defining the content of labels. It is a large, complex, and powerful system. Fortunately, most data preparers will never need to write XSD schema files from scratch, but it useful to know how to read them.

What It Is

XSD is itself an XML-based language (that is, it uses a set of tags defined by the XSD standard to markup the various content requirements and definitions). PDS uses it to translate the PDS4 Information Model into a series of classes, subclasses and attributes with specific content requirements. This schema can then be used to create new labels and to validate the content of the labels created.

XSD includes a set of basic data types, so that you can specify, for example, that the <start_time> attribute has a value that conforms to the ISO time standard format. It also provides ways to restrict these data types, so you can specify that your <photon_count> must be an integer greater than zero but less that the saturation value of your detector.

XSD is strictly ordered. If your XSD schema file says that <start_time> comes before <stop_time>, but you have it the other way around in your label, the validator will flag it as an error.

What It Is Not

Pretty.

XSD specifications tend to be wordy, and sometimes the markup can seem to overwhelm the content. This fades with familiarity. XML-aware editors also make it easier to navigate and visualize the XSD content.

While XSD is very useful for validating the presence or absence of specific PDS4 classes and attributes, it is not very adept at exclusive-or dependencies ("either this attribute or that one, but not both"), or on validity checks that are contingent on the actual content of one or more elements (e.g., "if the instrument name is 'Wally', then the <mode_ID> must be either 'fast' or 'slow'").

Basic Requirements

Most data preparers will be able to avoid writing or even modifying XSD schema files themselves. But you will need to know how to reference them and will likely want to know how to get useful information out of them - in particular, the PDS Master Schema.

Writing even simple XSD definitions is beynd the scope of a hitchhiker's guide. XML Schema files can be loaded into a browser or any XML-aware editor for viewing. How easy it is to read the file will depend heavily on how sophisticated your browser or editor is, which is also beyond the capabilities of this quick guide to cover.

XSD files in PDS4 are generally referenced by the XML Namespace they identify. (You can see examples of namespace referencing in HHG to Namespaces in PDS4.) To turn the namespace reference into a reference to a specific schema file, you may need to include schemaLocation hints, set up an XML Catalog file, or do some other sort of environmental configuration. Unfortunately, resolving schema references is not something that has yet settled on a single approach.

Example

As an example, here's what the example from the HHG to Namespaces in PDS4 might look like if you also include the schemaLocation attribute:

  <Product_Observational xmlns="http://pds.nasa.gov/pds4/pds/v11
                         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                         xmlns:sbn="http://pds.nasa.gov/pds4/sbn/v1" 
                         xmlns:img="http://pds.nasa.gov/pds4/img/v1"   
                         xmlns:di="http://pds.nasa.gov/pds4/mission/DeepImpact/v1"  
     xsi:schemaLocation="http://pds.nasa.gov/pds4/pds/v08 http://pds.nasa.gov/pds4/pds/v08/PDS4_PDS_0800k.xsd
                         http://pds.nasa.gov/pds4/sbn/v1 http://pds.nasa.gov/pds4/sbn/v1/sbnDD.xsd 
                         http://pds.nasa.gov/pds4/img/v1 http://pds.nasa.gov/pds4/img/v1/imgDD.xsd 
                         http://pds.nasa.gov/pds4/mission/DeepImpact/v1 http://pds.nasa.gov/pds4/mission/DeepImpact/v1/diDD.xsd">

Because PDS has decided to maintain the namespace URLs as "live" directories, the schemaLocation hint can add the specific file name containing the defining schema to the namespace URL, and could potentially download that document for use in processing or validating the label.

That all assumes that the software being used supports that sort of activity.