Difference between revisions of "Creating a Collection"

From The SBN Wiki
Jump to navigation Jump to search
(Creation - Safety Save)
(Creation - Safety Save)
Line 24: Line 24:
 
* For observational data collections, the <Observing_Sytem> of the <Context_Area> can and should be used to tie the collection as a whole to things like spacecraft and/or instrument wherever appropriate.
 
* For observational data collections, the <Observing_Sytem> of the <Context_Area> can and should be used to tie the collection as a whole to things like spacecraft and/or instrument wherever appropriate.
  
== Primary vs. Secondary ==
+
== Primary vs. Secondary Members ==
 +
 
 +
Collections are ''aggregate products''. They exist to define relationships between ''simple'' products like observations or documents. Any collection may contain two types of member products: ''primary'' and ''secondary''.
 +
 
 +
:'''Primary''' members have logical identifiers (LIDs) that contain the collection's LID.  A simple product must be a primary member of exactly one collection - the one on which the product's LID is based.  A simple product is always and inextricably associated with its primary collection. (The PDS sends products to the deep archive in their primary collections.)  If a primary member product is updated to a new version, the collection product ''must always'' be updated as well.
 +
 
 +
:'''Secondary''' members of a collection are primary members of some other collection. Any simple product may be listed as a secondary member of any number of additional collections.  In most cases, when a secondary member of a collection is updated it is not necessary to update the collection.
 +
 
 +
A collection may contain only primary, only secondary, or both primary and secondary member products.
  
 
== Compiling the Inventory Table ==
 
== Compiling the Inventory Table ==
 +
 +
The ''inventory table'' identifies all the products - primary or secondary - comprising the collection.  The table must be formatted a comma-delimited table with carriage-return/linefeed carriage control at the end of every line, with one line for each member of the collection.  Each line has two fields:
 +
# The first field is a single character, either "P" or "S", to indicate the status of the member product - ''primary'' or ''secondary''.
 +
# The second field identifies the member product by logical and version identifiers.
 +
 +
Primary members, indicated by a "P" in the first field, ''must'' be identified by both logical and version identifiers.  The format is:
 +
 +
::<code>&lt;logical_identifier&gt;::&lt;version_id&gt;</code>
 +
 +
where ''logical_identifier'' (LID) and ''version_id'' (VID) are both taken from the attributes of the same name in the ''&lt;Identification_Area&gt;'' of the member product.  For primary members that have been updated, only the highest version number of the product is listed in the inventory table.
 +
 +
Secondary members, indicated by an "S" in the first field, may be identified by either LID+VID, or by LID alone.  Omitting the VID implies that the latest available version of that member product should be considered a member of the collection.  Including a VID implied that only that specific version is considered a member, even if a new version of the product becomes available.

Revision as of 22:25, 5 July 2013

Collections are the primary means for organizing related PDS4 products. (Collections are themselves organized into bundles.) The member rpoducts of a collection have IDs based on the collection ID.

What Goes Into a Collection?

Typically, all the products in a collection will be of the same basic type (observational, document, etc.). Observational collections will also usually contain products all from the same instrument, mission phase, observational target, and/or calibration level. Data preparers may opt use criteria like review cycle or publishing deadline to assign collection membership in order to facilitate bookkeeping their data deliveries.

Organizing the Data

PDS makes no requirement on physical organization of the data, although data preparers will need to agree on an organization for data transfer to their consulting PDS node. (A typical organization is described in section 2B of the PDS4 Standards Reference.)

For collections with a very small (<~6) number of products, everything can go into a single directory. For larger collections, any reasonable directory hierarchy can be used; in which case the collection product itself - the inventory file and the label - will be located in the root directory of that hierarchy.

Versioning

Every PDS4 product - observation, document, or collection - has its own version number. The product version number tracks changes in both the label and the data files of that product. The version number of a collection product tracks changes in the collection label and inventory table.

For the collection, minor version numbers typically indicate small changes in the collection label, while major version numbers indicate changes to the inventory table.

Describing the Collection

The primary description for the collection is, of course, its product label. See the Collection Product topic on the PDS4 Product Labels, Step by Step page for a walk-through of the collection product label structures. Some things to remember for the collection label:

  • The <description> in the <Citation_Information> should be a brief abstract of the collection contents. If you are creating a new major version of a previous collection by updating all (or nearly all) of the products in the collection, the abstract should usually include a mention of the major change(s) in the new version.
  • The <Modification_Area> should have at least one new entry for each new version of the collection, to indicate what has changed.
  • For observational data collections, the <Observing_Sytem> of the <Context_Area> can and should be used to tie the collection as a whole to things like spacecraft and/or instrument wherever appropriate.

Primary vs. Secondary Members

Collections are aggregate products. They exist to define relationships between simple products like observations or documents. Any collection may contain two types of member products: primary and secondary.

Primary members have logical identifiers (LIDs) that contain the collection's LID. A simple product must be a primary member of exactly one collection - the one on which the product's LID is based. A simple product is always and inextricably associated with its primary collection. (The PDS sends products to the deep archive in their primary collections.) If a primary member product is updated to a new version, the collection product must always be updated as well.
Secondary members of a collection are primary members of some other collection. Any simple product may be listed as a secondary member of any number of additional collections. In most cases, when a secondary member of a collection is updated it is not necessary to update the collection.

A collection may contain only primary, only secondary, or both primary and secondary member products.

Compiling the Inventory Table

The inventory table identifies all the products - primary or secondary - comprising the collection. The table must be formatted a comma-delimited table with carriage-return/linefeed carriage control at the end of every line, with one line for each member of the collection. Each line has two fields:

  1. The first field is a single character, either "P" or "S", to indicate the status of the member product - primary or secondary.
  2. The second field identifies the member product by logical and version identifiers.

Primary members, indicated by a "P" in the first field, must be identified by both logical and version identifiers. The format is:

<logical_identifier>::<version_id>

where logical_identifier (LID) and version_id (VID) are both taken from the attributes of the same name in the <Identification_Area> of the member product. For primary members that have been updated, only the highest version number of the product is listed in the inventory table.

Secondary members, indicated by an "S" in the first field, may be identified by either LID+VID, or by LID alone. Omitting the VID implies that the latest available version of that member product should be considered a member of the collection. Including a VID implied that only that specific version is considered a member, even if a new version of the product becomes available.