Eclipse: Creating a New XML File from an XSD Schema File

From The SBN Wiki
Jump to navigation Jump to search

A common task for PDS4 data preparation is creating a new XML label file. Some will be one-off labels for things like context objects, documents, bundle and collections. Others will be turned into templates to use as input to label generation software.


The following instructions assume you have already downloaded the latest version of the PDS4 master schema (or the specific version you're working with, if not the latest release), and have added an XML Catalog entry to use it to define the PDS namespace, as described in Configuring XML Schema validation.

The example below uses a pre-release, beta-test version of the PDS4 master schema. Some of the details of the schema content you see might not be valid under official releases, but these details are not relevant to the process we're describing - so try to ignore them...


Here are the files I'm starting with in my "Demo" project. I'll be using the PDS master schema file called PDS4_PDS_0900B.xsd as the basis for my new XML label file.


Select New->XML File

I want to create a new file in my main project directory, so I call up the context menu for Demo (right-clicking on my Windows system, for example), and select New->XML File.


This will open a New XML File dialogue box. Make sure the "parent folder" name above the center box is the one you want, and type in the name for the new file. Note that all XML label files should have an extension of ".xml".


Select "Create from Schema"

Now, instead of clicking the Finish button, click the Next> button, to show the creation options. Click the Create XML File from an XML Schema file radio button:


Then click Next> to get to the schema selection dialogue.

Select a Schema File


You're presented with two options. Either way you'll end up in the same place, but here's how they work.

The first option is the Select file from Workspace option, which will let you select a schema (.xsd) file from someplace (anyplace) in your workspace. Here I've selected the PDS4 master schema file in my "Demo" project "Schemas" subdirectory:


Alternately, if I click the Select XML Catalog entry button, I'm presented with a list of every namespace currently listed in the eclipse XML Catalog file, including the entry I created for the PDS4 namespace, which I can then select:


Either way, clicking on Next> should take you to the same place:


Select Root Element

First thing to change is the Root element, shown as a pull-down list near the top. Select the specific product type from the list. Never use Label as the root element of your label. I'll select Product_Observational, for labeling an observational data file

Select Content to be Created

Next come check boxes for Content options. I generally uncheck all of these, but that's my personal work preference. Once you've done a few of these, you'll develop some preferences of your own. Mainly it comes down to whether you want to add things to the minimal required set, or delete things from the set of all possible options.

Specify Namespaces and Prefixes

Finally, we come to the Namespace Information box. I'm not a big fan of single-letter abbreviations and the PDS preference is for standardized namespace abbreviations, so I'm going to start by selecting the namespace that's showing and clicking the Edit... button:


This opens another dialogue in which you're only allowed to edit the prefix abbreviation. Since most of the keywords in a PDS Label are from the PDS namespace, I'm going to delete the default "p" and leave it blank. This is also a personal preference - you could equally well change the "p" to "pds" (in which case all the PDS4 element names will be prefixed with "pds:" - for example, the root element would then be <pds:Product_Observational>). Here's what the "no prefix" option looks like before I hit Next>:


And here's what it looks like after:


Now, if I also know I'll be referencing other name spaces, it's convenient to add their info here as well. For example, to add the SBN dictionary name space, I click the Add... button and click the Specify New Namespace option at the top of the resulting dialogue:


The prefix will be the standard prefix for the SBN namespace, which is "sbn". Unfortunately, eclipse doesn't give you any auto-fill help with the actual namespace name, so you'll have to type it carefully. I've typed it in for the SBN namespace, below. Finally, since locations are machine-specific and we don't actually want machine-specific information in archival labels (and we're using the XML Catalog file entries to resolve namespace URIs to a physical file reference), the Location Hint: should remain blank:


Here's the result:


Note that each namespace must have a unique prefix, and only one namespace can be the default namespace (i.e., have no prefix). You can provide a prefix for all namespaces, if you like. You can also change, add and delete additional namespaces and prefixes after you create the label, if needed (although changing a prefix after you've added namespace elements can be an error-prone operation).

Once you're done adding namespaces, click the Finish button and the skeleton of your new XML file will appear in your editor window, though there's still some toughing-up to do:


Add the Schematron File Reference

The first thing to add is the <?xml-model?> processing instruction to tell the Schematron validator where to find the reference schematron file. Here I've added the instruction with a reference to the location of the schematron file relative to the root of my "Demo" project. Note that the file extension for schematron files is .sch:


Edit Namespaces, Prefixes, and SchemaLocations

Next, I need to put some line breaks inside the opening Product_Observational tag so I can see what's going on. Whitespace inside tags is not significant, so adding line breaks and padding will not cause a problem but do make it easier for me to get the accounting right:


Here you can see the result of the namespace definition we did in creating the label - the PDS4 common namespace is identified and assigned no prefix; the SBN namespace is also present and is assigned the "sbn" prefix.

The other namespace, "xsi" exists to provide the "xsi:schemaLocation" attribute definition. But we're using XML Catalog files to translate namespace URIs into schema locations, so we don't need the "xsi:schemaLocation" attribute or the namespace definition. You can leave the namespace definition, if you like, it won't hurt anything; but the xsi:schemaLocation attribute has to go. Delete it and remember to close the Product_Observationaltag with ">". Here's what I'm left with after deleting the dead wood:


If you decide later to add additional namespaces, you can add them to the list in the root tag (<Product_Observational>, in this case).

Now you're ready to start editing the label content...