HHG to Schematron in PDS4

From The SBN Wiki
Revision as of 20:24, 3 August 2017 by Akash (talk | contribs) (categories)
Jump to navigation Jump to search

Schematron is a schema definition language, similar to XSD (the XML Schema Definition Language). Schematron, however, is very good at defining constraints involving co-existence (the exclusive-or relationship) as well as conditional constraints (constraints applied when a particular value is encountered).

PDS4 will use both XSD and Schematron files to validate incoming labels. Data preparers that are compiling local dictionaries may also wish to generate a Schematron file to compliment their dictionary. This is not compulsory. If one is needed, the dictionary creation tool will write one.

What It Is

Schematron is an XML-based schema language that uses the XPath standard notation to test specific attributes in PDS4 labels. It's a validation tool, and as such provides some level of documentation (if you can decipher it) of the overall properties of the data.

What It Is Not

Schematron is not a substitute for XSD. You cannot define your label structure only by using Schematron files. Neither should you use Schematron to constrain things that could easily be constrained in the XSD file - like data types.

Basic Requirements

If you're not creating the local data dictionary for your mission or discipline, then you probably don't need to think about Schematron files at all, except as a validation step.

If you are preparing a local dictionary, the PDS4 local dictionary tool will create a Schematron file for you as needed, and you probably don't need to think about it much beyond that.

If you find you need to define some fairly complex constraints to validate the use of local, or even non-local attributes in your labels, then you will need to get to know Schematron rather well. That level of detail is beyond this simple guide. Schematron coding is sort of along the lines of regular expression coding - you need to be knowledgeable about both the Schematron syntax as well as the underlying symbolic logic describing the situations you want to validate.

An Example

Here is a snippet of Schematron code from the Schematron file accompanying one of the pre-release versions of the PDS4 Master Schema ("sch" is the namespace prefix associated with Schematron tags):

    <sch:rule context="pds:Array_2D_Image">      
      <sch:assert test="pds:axes = ('2')">        
        The attribute pds:axes must be equal to the value '2'.
      <sch:assert test="pds:encoding_type = ('Binary', 'Character')">
        The attribute pds:encoding_type must be equal to one of the following values 'Binary',   
      <sch:assert test="pds:axis_index_order = ('Last_Index_Fastest')">
        The attribute pds:axis_index_order must be equal to the value 'Last_Index_Fastest'.  

Here's what's going on (Note that my understanding is largely imperfect on this topic):

The <sch:pattern> tag is a grouping mechanism for the rules.

Each rule is applied only when the processor hits an XML element with the given context. So the rule above is triggered when a <pds:Array_2D_Image> tag is encountered. (Namespaces are defined in Schematron files as they are in the labels, so it is not necessary that the prefix be present in the label file if the PDS namespace is defined to be the default namespace in the label, for example.)

The <sch:assert> statements are the workhorses here. The first one is checking that, in an Array_2D_Image class, first there must be an attribute from the PDS core namespace called axes and that it must have a value of "2". If either of those things isn't true, the message comprising the content of the sch:assert tag will be displayed as an error message.

The second assert requires that the pds:encoding_type tag be present with a value of either "Binary" or "Character" (case counts). The pds:axis_index_order assertion requires that pds:axis_index_order attribute be present and it must have the value given.