HHG to XML Schema Definition Language in PDS4

From The SBN Wiki
Revision as of 16:46, 19 November 2012 by Raugh (talk | contribs) (Creation - Safety Save)
Jump to navigation Jump to search

As explained in the HHG to the eXtensible Markup Language in PDS4, XML is a syntax standard. It defines how to identify markup tags, but it does not define what the tag names are or how they are to be interpreted. In order to do that, an organization like PDS needs to define tag names and their significance for processing. There are a number of ways to do this.

Document Type Definitions were popular for a long while. They could be embedded into the document or provided in a separate file. But DTDs were not the best fit for PDS, where we have fairly elaborate constraints we wantrom validators to enforce in all labels.

Schema languages are another approach to this tag-definitions problem. PDS decided to use the XML Schema Definition Language" (XSD) for defining the content of labels. It is a large, complex, and powerful system. Fortunately, most data preparers will never need to write XSD schema files from scratch, but it useful to know how to read them.

What It Is

XSD is itself an XML-based language (that is, it uses a set of tags defined by the XSD standard to markup the various content requirements and definitions). PDS uses it to translate the PDS4 Information Model into a series of classes, subclasses and attributes with specific content requirements. This schema can then be used to create new labels and to validate the content of the labels created.

XSD includes a set of basic data types, so that you can specify, for example, that the <start_time> attribute has a value that conforms to the ISO time standard format. It also provides ways to restrict these data types, so you can specify that your <photon_count> must be an integer greater than zero but less that the saturation value of your detector.

XSD is strictly ordered. If your XSD schema file says that <start_time> comes before <stop_time>, but you have it the other way around in your label, the validator will flag it as an error.

What It Is Not

Pretty.

XSD specifications tend to be wordy, and sometimes the markup can seem to overwhelm the content. This fades with familiarity. XML-aware editors also make it easier to navigate and visualize the XSD content.

While XSD is very useful for validating the presence or absence of specific PDS4 classes and attributes, it is not very adept at exclusive-or dependencies ("either this attribute or that one, but not both"), or on validity checks that are contingent on the actual content of one or more elements (e.g., "if the instrument name is 'Wally', then the <mode_ID> must be either 'fast' or 'slow'").

Basic Requirements

Most data preparers will be able to avoid writing or even modifying XSD schema files themselves. But you will need to know how to reference them and will likely want to know how to get useful information out of them - in particular, the PDS Master Schema.