HHG to Schematron in PDS4

From The SBN Wiki
Jump to navigation Jump to search

Schematron is a schema definition language, similar to XSD (the XML Schema Definition Language). Schematron, however, is very good at defining constraints involving co-existence (the exclusive-or relationship) as well as conditional constraints (constraints applied when a particular value is encountered).

A PDS4 dictionary is actually defined by two files: An XSD file and a Schematron file. This is true for all dictionaries - the PDS4 core, the discipline dictionaries, and local mission or project dictionaries.

What It Is

Schematron is an XML-based schema language that uses the XPath standard notation to test specific attributes in PDS4 labels. It's a validation tool, and as such provides some level of documentation (if you can decipher it) of the overall properties of the data.

What It Is Not

Schematron is not a substitute for XSD. You cannot define your label structure only by using Schematron files. Neither should you use Schematron to constrain things that could easily be constrained in the XSD file - like data types.

The Basics

If you're not creating the local data dictionary for your mission or project, then you probably don't need to think about Schematron files at all, except as a validation step.

If you are preparing a local dictionary, the PDS4 local dictionary tool will create a Schematron file for you. If your dictionary is very simple, you may not need to do anything specifically with Schematron.

If you find you need to define some fairly complex constraints to validate the use of various attributes in your labels, then you will need to get to know Schematron, and more specifically XPath, rather well. That level of detail is beyond this simple guide. XPAth coding for Schematron is sort of along the lines of regular expression coding - you need to be knowledgeable about both the Schematron syntax as well as the underlying symbolic logic describing the situations you want to validate. There is additional information on this wiki on the Creating the Ingest LDD Dictionary Input File pages.

An Example

Here is a snippet of Schematron code from the Schematron file accompanying one of the pre-release versions of the PDS4 Master Schema ("sch" is the namespace prefix associated with Schematron tags):

  <sch:pattern>
    <sch:rule context="pds:Array_2D_Image">      
      <sch:assert test="pds:axes = ('2')">        
        The attribute pds:axes must be equal to the value '2'.
      </sch:assert>
      <sch:assert test="pds:encoding_type = ('Binary', 'Character')">
        The attribute pds:encoding_type must be equal to one of the following values 'Binary',   
        'Character'.
      </sch:assert>
      <sch:assert test="pds:axis_index_order = ('Last_Index_Fastest')">
        The attribute pds:axis_index_order must be equal to the value 'Last_Index_Fastest'.  
      </sch:assert>
    </sch:rule>
  </sch:pattern>

Here's what's going on (Note that my understanding is largely imperfect on this topic):

The <sch:pattern> tag is a grouping mechanism for the rules.

Each rule is applied only when the processor hits an XML element with the given context. So the rule above is triggered when a <pds:Array_2D_Image> tag is encountered. (Namespaces are defined in Schematron files as they are in the labels, so it is not necessary that the prefix be present in the label file if the PDS namespace is defined to be the default namespace in the label, for example.)

The <sch:assert> statements are the workhorses here. The first one is checking that, in an Array_2D_Image class, first there must be an attribute from the PDS core namespace called axes and that it must have a value of "2". If either of those things isn't true, the message comprising the content of the sch:assert tag will be displayed as an error message.

The second assert requires that the pds:encoding_type tag be present with a value of either "Binary" or "Character" (case counts). The pds:axis_index_order assertion requires that pds:axis_index_order attribute be present and it must have the value given.