Difference between revisions of "PDS4 Data Structures"

From The SBN Wiki
Jump to navigation Jump to search
(Safety Save)
 
(Safety Save)
Line 1: Line 1:
PDS4 data structures are designed for long-term archivingIn particular, for archival science data the primary data structures are arrays and tables. These simple structures not only ensure long-term stability in the archive, but they are also relatively hard to mis-read, reducing the amount of end-user error resulting from misunderstood record formats.
+
PDS4 data structures are designed for long-term archive stability and to support interdisciplinary useSpecifically, for archival science data the primary data structures are tables and arrays of 2-4 dimensions. These simple structures not only ensure long-term stability in the archive, but they are also relatively hard to mis-read, reducing the amount of end-user error resulting from misunderstood record formats.  The use of these simple, standardized formats also supports inter-disciplinary use of the data, as it tends to severely limit dependence on specific software packages for working with PDS4 data.
  
 
== Choosing Data Structures ==
 
== Choosing Data Structures ==
  
You should choose PDS4 data structures that match the logical view of your data.  For example, a simple image is archived as a 2D array object - ''not'' as a table of vectors.  Similarly, if you have a series of 2D arrays containing ancillary data for an image - like quality flags, dark current, bad pixel maps, and such -  each of those 2D arrays is archived as a separate data structure, ''not'' as pseudo-bands in an image cube.
+
You should choose PDS4 data structures that match the logical view of your data, as broken down into a series of tables and arrays.  For example, a simple image is archived as a 2D array object, ''not'' as a table of vectors.  Similarly, if you have a series of 2D arrays containing ancillary data for an image - like quality flags, dark current, bad pixel maps, and such -  each of those 2D arrays must be archived as a separate data structure, ''not'' as pseudo-bands in an image cube.
  
== Storage Structure ==
+
== File Structure ==
  
Finally, all PDS4 data structures stored in a file must be distinct from each other, and contiguous in themselves.  So you may not interleave records of a table with scan lines of an image.  For the archival science product, you must separate the table out into a contiguous block of bytes, though you can have the table and image stored sequentially in the same data file if you like.
+
Multiple data objects may be stored in a single data file, and frequently are. All PDS4 data structures stored in a file must be distinct from each other and contiguous in themselves.  So you may not, for example, interleave records of a table with scan lines of an image - you must separate the table out into one contiguous block of bytes and the image into another. The table and image may, of course, be stored sequentially in the same data file once separated into distinct data objects.
 +
 
 +
== Non-PDS Data Formats ==
 +
 
 +
Sometimes it happens that data formatted to some other sort of standard (a processing standard or a transport standard) may, coincidentally, already exist in a file as a series of separate arrays and tables that are consistent with the logical view of the data, and thus meet the PDS4 data structure requirements.  In these cases, the PDS4 label may be written to describe the data as it exists in the original file.  The PDS4 ''Header'' object may be used to indicate a header block formatted according to another standard that is also included in the file (a VICAR or FITS header, for example).  But even when the data file is PDS4-compliant, the PDS4 label must contain all the information needed for a user to read and interpret the data.  The ''Header'' object is merely a way of accounting for non-PDS4-compliant bytes; end users must be able to ignore it without consequence.
 +
 
 +
'''Note:''' Even though some files formatted to other data standards may be PDS4 complaint, there is no guarantee that any other file written in that standard will be. ''There is no other data format standard known as of this writing that is guaranteed to produce a PDS4-compliant data file in all cases.'' Every science product archived with the PDS ''must'' be in a PDS4-compliant format.  It is the data preparer's responsibility to ensure PDS4 compliance for all science data products submitted for archiving.
 +
 
 +
=== FITS Format ===
 +
 
 +
FITS is a transport format common in small bodies data.  Many FITS files contain data structures that are PDS4-compliant, though some data preparers use the format in non-standard ways or pack multiple, logically distinct arrays into a single multi-dimensional IMAGE in a way that is not PDS4-compliant. If you think you are dealing with a PDS4-compliant FITS file, see this page:
 +
 
 +
* [[Notes for Labelling FITS files]]

Revision as of 17:09, 25 November 2013

PDS4 data structures are designed for long-term archive stability and to support interdisciplinary use. Specifically, for archival science data the primary data structures are tables and arrays of 2-4 dimensions. These simple structures not only ensure long-term stability in the archive, but they are also relatively hard to mis-read, reducing the amount of end-user error resulting from misunderstood record formats. The use of these simple, standardized formats also supports inter-disciplinary use of the data, as it tends to severely limit dependence on specific software packages for working with PDS4 data.

Choosing Data Structures

You should choose PDS4 data structures that match the logical view of your data, as broken down into a series of tables and arrays. For example, a simple image is archived as a 2D array object, not as a table of vectors. Similarly, if you have a series of 2D arrays containing ancillary data for an image - like quality flags, dark current, bad pixel maps, and such - each of those 2D arrays must be archived as a separate data structure, not as pseudo-bands in an image cube.

File Structure

Multiple data objects may be stored in a single data file, and frequently are. All PDS4 data structures stored in a file must be distinct from each other and contiguous in themselves. So you may not, for example, interleave records of a table with scan lines of an image - you must separate the table out into one contiguous block of bytes and the image into another. The table and image may, of course, be stored sequentially in the same data file once separated into distinct data objects.

Non-PDS Data Formats

Sometimes it happens that data formatted to some other sort of standard (a processing standard or a transport standard) may, coincidentally, already exist in a file as a series of separate arrays and tables that are consistent with the logical view of the data, and thus meet the PDS4 data structure requirements. In these cases, the PDS4 label may be written to describe the data as it exists in the original file. The PDS4 Header object may be used to indicate a header block formatted according to another standard that is also included in the file (a VICAR or FITS header, for example). But even when the data file is PDS4-compliant, the PDS4 label must contain all the information needed for a user to read and interpret the data. The Header object is merely a way of accounting for non-PDS4-compliant bytes; end users must be able to ignore it without consequence.

Note: Even though some files formatted to other data standards may be PDS4 complaint, there is no guarantee that any other file written in that standard will be. There is no other data format standard known as of this writing that is guaranteed to produce a PDS4-compliant data file in all cases. Every science product archived with the PDS must be in a PDS4-compliant format. It is the data preparer's responsibility to ensure PDS4 compliance for all science data products submitted for archiving.

FITS Format

FITS is a transport format common in small bodies data. Many FITS files contain data structures that are PDS4-compliant, though some data preparers use the format in non-standard ways or pack multiple, logically distinct arrays into a single multi-dimensional IMAGE in a way that is not PDS4-compliant. If you think you are dealing with a PDS4-compliant FITS file, see this page: