Difference between revisions of "Filling Out the Array 2D Data Structure"

From The SBN Wiki
Jump to navigation Jump to search
(Update for IM 1.14.0.0)
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
The '''''<Array_2D>''''' class is the generic base on which all the specific ''Array_2D_*'' classes are built.  Use this class only when one of the more specific flavors cannot be reasonably applied.
 
The '''''<Array_2D>''''' class is the generic base on which all the specific ''Array_2D_*'' classes are built.  Use this class only when one of the more specific flavors cannot be reasonably applied.
  
 +
In most cases, even a generic ''Array_2D'' data structure should be accompanied by a ''<Display_Settings>'' class in the ''Discipline_Area'' of the label, to define the correct way to draw the data on a display device.  If you think this does not apply to your Array_2D, please contact your SBN consultant ASAP for an argument.  See [[Filling Out the Display Dictionary Classes]] for more information.
  
 
For additional explanation, see the PDS4 ''Standards Reference'', or contact your PDS node consultant.
 
For additional explanation, see the PDS4 ''Standards Reference'', or contact your PDS node consultant.
Line 19: Line 20:
  
 
If you need to reference this array class from somewhere else in the label (like in a ''<Display_Settings>'' class describing the correct display orientation), use this attribute to define a local identifier to use as a hook. Since nearly all arrays ''should'' have display information included, you should get in the habit of providing ''<local_identifier>'' attributes for all array-type objects. Follow the rules for naming a variable in a typical programming language and you should be OK.
 
If you need to reference this array class from somewhere else in the label (like in a ''<Display_Settings>'' class describing the correct display orientation), use this attribute to define a local identifier to use as a hook. Since nearly all arrays ''should'' have display information included, you should get in the habit of providing ''<local_identifier>'' attributes for all array-type objects. Follow the rules for naming a variable in a typical programming language and you should be OK.
 +
 +
== <md5_checksum> ==
 +
 +
''OPTIONAL''
 +
 +
Use this attribute to supply an MD5 checksum for the data object ''only''.  In general, if the data object occupies the entire file, then the checksum should be given as an attribute of the ''<File>'' class.  This checksum is calculated using only the bytes defined by this ''Array'' data structure.
  
 
== <offset> ==
 
== <offset> ==
Line 39: Line 46:
 
''REQUIRED''
 
''REQUIRED''
  
This attribute is required to be present and must have a value of '''Last Index Fastest'''."Last" is with respect to the ''<sequence_number>'' values in the ''<Axis_Array&gt'' classes.
+
This attribute is required to be present and must have a value of '''Last Index Fastest'''. "Last" is with respect to the ''<sequence_number>'' values in the ''<Axis_Array>'' classes.
  
 
== <description> ==
 
== <description> ==
Line 65: Line 72:
 
If there is a unit of measure associated with the array element values, use this attribute to specify it.
 
If there is a unit of measure associated with the array element values, use this attribute to specify it.
  
''If the data are unitless, '''DO NOT INCLUDE THIS ATTRIBUTE'''!''. The SBN will not accept data sets containing these or the equivalent:
+
''If the data are unitless, '''DO NOT INCLUDE THIS ATTRIBUTE'''!''  The SBN will not accept data sets containing these or the equivalent:
 
<pre>
 
<pre>
 
     <unit/>
 
     <unit/>
Line 77: Line 84:
 
If the data have been scaled (divided by a constant), put the scaling factor in this attribute.   
 
If the data have been scaled (divided by a constant), put the scaling factor in this attribute.   
  
When reading the data, the value is multiplied by the ''&lt;scaling_factor&gt;'' value before adding the ''&lt;offset&gt;'' value.
+
When reading the data, the value is multiplied by the ''&lt;scaling_factor&gt;'' value before adding the ''&lt;value_offset&gt;''.
  
 
=== &lt;value_offset&gt; ===
 
=== &lt;value_offset&gt; ===
Line 85: Line 92:
 
If an offset has been subtracted from the data, put the offset value in this attribute.  Offsets may be positive or negative.
 
If an offset has been subtracted from the data, put the offset value in this attribute.  Offsets may be positive or negative.
  
When reading the data, the value is first multiplied by''&lt;scaling_factor&gt;'', then the ''&lt;offset&gt;'' value is added.
+
When reading the data, the value is first multiplied by ''&lt;scaling_factor&gt;'', then the ''&lt;value_offset&gt;'' is added.
  
 
== &lt;Axis_Array&gt; ==
 
== &lt;Axis_Array&gt; ==
Line 97: Line 104:
 
''REQUIRED''
 
''REQUIRED''
  
This is the name of the array axis being described.  The ''axis_name'' is typically something like "Wavelength" or "Distance".  The value should be useful for labelling the axis in a display.  
+
This is the name of the array axis being described.  The ''axis_name'' is typically something like "Wavelength" or "Distance".  The value should be useful for labelling the axis in a display. For some data structures derived from the &lt;Array_2D&gt; structure, the names of the axes may be fixed.  The value may contain imbedded spaces and UTF-8 characters, unless it is further constrained by the specific type of ''Array_2D'' objects you're describing.
  
Note that for some data structures derived from the &lt;Array_2D&gt; structure, the names of the axes may be fixed.  
+
{| class="wikitable" style="background-color: yellow"
 +
| '''Note:''' ''Axis names should be unique (within the array object), but this is not currently enforced.  Cut and paste carefully.''
 +
|}
  
 
=== &lt;local_identifier&gt; ===
 
=== &lt;local_identifier&gt; ===
Line 105: Line 114:
 
''OPTIONAL''
 
''OPTIONAL''
  
This is a unique (within the label) name for the axis.  So if your label contains three different arrays, all three can have an axis with an ''axis_name'' value of "Line", but they may not have the same values for ''local_identifier''.  Include this attribute when you will reference specific axes from some other part of the label.  Typically, this will happen when you use discipline dictionaries that reference parts of data structures, as the Spectral Dictionary does in defining the spectral characteristics of an array.
+
This is a unique (within the label) name for the axis.  So if your label contains three different arrays, all three can have an axis with an ''axis_name'' value of "Line", but they may not have the same values for ''local_identifier''.  Include this attribute when you will be referencing specific axes of your data structure(s) from some other part of the label.   
  
 
=== &lt;elements&gt; ===
 
=== &lt;elements&gt; ===
Line 117: Line 126:
 
''REQUIRED''
 
''REQUIRED''
  
This number defines an order for the axes so that the ''&lt;axis_index_order&gt;'' value can be interpreted correctly for this ''Array''.  One of the axes must have a ''sequence_number'' of "1", the other "2".  It is not necessary that the first ''Axis_Array'' class in the label have ''&lt;sequence_number&gt;'' equal to 1, but it would tend to make life easier for reviewers and users.
+
This number defines an order for the axes so that the ''&lt;axis_index_order&gt;'' value can be interpreted correctly for this ''Array''.  One of the axes must have a ''sequence_number'' of "1", the other "2".  It is required that the first ''Axis_Array'' class appearing in the label have ''&lt;sequence_number&gt;'' equal to "1"; the second equal to "2".
  
 
== &lt;Special_Constants&gt; ==
 
== &lt;Special_Constants&gt; ==
Line 124: Line 133:
  
 
Use this class to define any flag values that appear in the data to indicate drop outs, saturation, and other conditions that render a single pixel unknown.  Every attribute in this class is optional.  If you don't need any of the special constants, don't include this class in your ''Array_2D_*''.
 
Use this class to define any flag values that appear in the data to indicate drop outs, saturation, and other conditions that render a single pixel unknown.  Every attribute in this class is optional.  If you don't need any of the special constants, don't include this class in your ''Array_2D_*''.
 +
 +
'''Note:''' All the special constant fields are defined as strings with the implicit assumption that the string can be converted to the same data type as defined for the corresponding ''&lt;Element_Array&gt;''. This, however, requires that a character string be transformed into a numeric format before it can be compared to values in the data object itself.  While integer conversions (within the hardware storage representation limits) are precise, floating-point representations are not unless the values are chosen with exquisite care.  The figurative floating point values "NaN" and "+/-INF" are not permitted for use in these constants. So, if you are planning to provide ''Special_Constants'' for floating point data, please keep in mind that comparisons to values in the data file will more than likely need to take into account the conversion error involved in going from string to floating point hardware representation.
  
 
=== &lt;saturated_constant&gt; ===
 
=== &lt;saturated_constant&gt; ===
Line 236: Line 247:
  
 
This is the standard deviation of the ''&lt;mean&gt;'', excluding those elements containing flag values defined in the associated ''&lt;Special_Constants&gt;'' class, in the same units as the element. Bit mask is applied; offset and scaling factor are not.
 
This is the standard deviation of the ''&lt;mean&gt;'', excluding those elements containing flag values defined in the associated ''&lt;Special_Constants&gt;'' class, in the same units as the element. Bit mask is applied; offset and scaling factor are not.
 
=== &lt;bit_mask&gt; ===
 
 
''OPTIONAL''
 
 
For values not aligned on word boundaries, this attribute contains the bit mask used to recover the value from the words after reading them into memory.  Bit masks are formulated as a simple string of ones and zeroes.  For example:
 
<pre>
 
    <bit_mask>00011111</bit_mask>
 
</pre>
 
Bit masks are applied before scaling factors and offsets, using the standard bitwise-and logical operation.
 
 
{| class="wikitable" style="background-color: yellow"
 
| '''''Notes:''''' ''Obviously, a bit mask isn't a "statistic".  This belongs in the array element definition class, not here, as it is essential to being able to read the data properly. And yet, here it remains...
 
 
Bit masks in general make it much more difficult to access and process the data, because each value must be carefully manipulated (taking into account byte order issues) before it can be stored into programmatic memory.  Avoid them in SBN data unless absolutely, positively necessary.  And then don't use them.''
 
|}
 
  
 
=== &lt;median&gt; ===
 
=== &lt;median&gt; ===
Line 258: Line 253:
  
 
This attribute contains the median value of the real data values (excluding flag values) in the array, in the same units as the element.  Any bit mask is applied prior to determining the median, but offset and scaling factor are not.
 
This attribute contains the median value of the real data values (excluding flag values) in the array, in the same units as the element.  Any bit mask is applied prior to determining the median, but offset and scaling factor are not.
 
=== &lt;md5_checksum&gt; ===
 
 
''OPTIONAL''
 
 
This is the checksum of just the array data (that is, it might be a checksum of part of a file), calculated using the MD5 algorithm. To calculate this checksum the data comprising of the array must be treated as a simple sequence of bytes as they come from the file.  So no bit masks, offsets, scaling factors, or even byte-swapping are applied.
 
 
For the hex digits in the value, you ''must'' use the lowercase letters ''a-f''.
 
  
 
=== &lt;maximum_scaled_value&gt; ===
 
=== &lt;maximum_scaled_value&gt; ===

Latest revision as of 17:29, 3 August 2020

The <Array_2D> class is the generic base on which all the specific Array_2D_* classes are built. Use this class only when one of the more specific flavors cannot be reasonably applied.

In most cases, even a generic Array_2D data structure should be accompanied by a <Display_Settings> class in the Discipline_Area of the label, to define the correct way to draw the data on a display device. If you think this does not apply to your Array_2D, please contact your SBN consultant ASAP for an argument. See Filling Out the Display Dictionary Classes for more information.

For additional explanation, see the PDS4 Standards Reference, or contact your PDS node consultant.

Following are the attributes and subclasses you'll find in <Array_2D>, in label order.

Note that in the PDS4 master schema, all classes have capitalized names; attributes never do.

<name>

OPTIONAL

This attribute can be used to give a descriptive name to the array.

<local_identifier>

OPTIONAL

If you need to reference this array class from somewhere else in the label (like in a <Display_Settings> class describing the correct display orientation), use this attribute to define a local identifier to use as a hook. Since nearly all arrays should have display information included, you should get in the habit of providing <local_identifier> attributes for all array-type objects. Follow the rules for naming a variable in a typical programming language and you should be OK.

<md5_checksum>

OPTIONAL

Use this attribute to supply an MD5 checksum for the data object only. In general, if the data object occupies the entire file, then the checksum should be given as an attribute of the <File> class. This checksum is calculated using only the bytes defined by this Array data structure.

<offset>

REQUIRED

This is the offset in bytes from the beginning of the file containing the array data to the beginning of the array. You must specify a unit of "byte" for this attribute, thus:

    <offset unit="byte">0</offset>

<axes>

REQUIRED

This attribute is required to be present and must have a value of "2".

<axis_index_order>

REQUIRED

This attribute is required to be present and must have a value of Last Index Fastest. "Last" is with respect to the <sequence_number> values in the <Axis_Array> classes.

<description>

OPTIONAL

This is a place where additional description can be included, if desired.

<Element_Array>

REQUIRED

This class defines the attributes of the array element.

<data_type>

REQUIRED

The value here must be one of the binary numeric types from the list in the Standard Values Quick Reference.

<unit>

OPTIONAL

If there is a unit of measure associated with the array element values, use this attribute to specify it.

If the data are unitless, DO NOT INCLUDE THIS ATTRIBUTE! The SBN will not accept data sets containing these or the equivalent:

    <unit/>
    <unit>N/A</unit>

<scaling_factor>

OPTIONAL

If the data have been scaled (divided by a constant), put the scaling factor in this attribute.

When reading the data, the value is multiplied by the <scaling_factor> value before adding the <value_offset>.

<value_offset>

OPTIONAL

If an offset has been subtracted from the data, put the offset value in this attribute. Offsets may be positive or negative.

When reading the data, the value is first multiplied by <scaling_factor>, then the <value_offset> is added.

<Axis_Array>

REQUIRED

This class describes one dimension of the two-dimensional array. There must be exactly two instances of this class in any Array_2D_* object.

<axis_name>

REQUIRED

This is the name of the array axis being described. The axis_name is typically something like "Wavelength" or "Distance". The value should be useful for labelling the axis in a display. For some data structures derived from the <Array_2D> structure, the names of the axes may be fixed. The value may contain imbedded spaces and UTF-8 characters, unless it is further constrained by the specific type of Array_2D objects you're describing.

Note: Axis names should be unique (within the array object), but this is not currently enforced. Cut and paste carefully.

<local_identifier>

OPTIONAL

This is a unique (within the label) name for the axis. So if your label contains three different arrays, all three can have an axis with an axis_name value of "Line", but they may not have the same values for local_identifier. Include this attribute when you will be referencing specific axes of your data structure(s) from some other part of the label.

<elements>

REQUIRED

This attribute must contain the number of elements along this axis of the array. For example, if the Array_2D in question has dimensions 112x256, then the <elements> value in the first <Axis_Array> would be "112".

<sequence_number>

REQUIRED

This number defines an order for the axes so that the <axis_index_order> value can be interpreted correctly for this Array. One of the axes must have a sequence_number of "1", the other "2". It is required that the first Axis_Array class appearing in the label have <sequence_number> equal to "1"; the second equal to "2".

<Special_Constants>

OPTIONAL

Use this class to define any flag values that appear in the data to indicate drop outs, saturation, and other conditions that render a single pixel unknown. Every attribute in this class is optional. If you don't need any of the special constants, don't include this class in your Array_2D_*.

Note: All the special constant fields are defined as strings with the implicit assumption that the string can be converted to the same data type as defined for the corresponding <Element_Array>. This, however, requires that a character string be transformed into a numeric format before it can be compared to values in the data object itself. While integer conversions (within the hardware storage representation limits) are precise, floating-point representations are not unless the values are chosen with exquisite care. The figurative floating point values "NaN" and "+/-INF" are not permitted for use in these constants. So, if you are planning to provide Special_Constants for floating point data, please keep in mind that comparisons to values in the data file will more than likely need to take into account the conversion error involved in going from string to floating point hardware representation.

<saturated_constant>

OPTIONAL

This value indicates the data value was lost because of detector saturation.

<missing_constant>

OPTIONAL

This value indicates the data value is known to be missing for some reason not covered by the other constants available in this class.

<error_constant>

OPTIONAL

This value indicates the data value originally reported was known to be in error for some reason, and was replaced by this flag.

<invalid_constant>

OPTIONAL

This value indicates the data value originally recorded or calculated was outside the valid range for array elements.

<unknown_constant>

OPTIONAL

This value indicates the data value in this file is unknown because it was unknown in the source and cannot be recovered.

<not_applicable_constant>

OPTIONAL

This value indicates that the concept underlying the datum is not applicable in a particular context.

Note: No Array_2D_* should ever have a reason to use this constant. If you disagree, let me know.

<valid_maximum>

OPTIONAL

This value is the maximum possible observational value that might be in the data. This is useful if your flag values are greater than this value and you want to simplify the exclusion logic.

<high_instrument_saturation>

OPTIONAL

This value indicates the original datum was in the high-end saturation range of the instrument.

<high_representation_saturation>

OPTIONAL

This value is used to indicate that, while the original observed value was valid, it is out of range of the numeric format chosen for this Array_2D in a way that would be considered "too high" - absolute magnitude too great, positive value too large, or positive exponent too large to be represented.

<valid_minimum>

OPTIONAL

This value is the minimum possible observational value that might be in the data. This is useful if your flag values are less than this value and you want to simplify the exclusion logic.

<low_instrument_saturation>

OPTIONAL

This value indicates the original datum was in the low-end saturation range of the instrument.

<low_representation_saturation>

OPTIONAL

This value is used to indicate that, while the original observed value was valid, it is out of range of the numeric format chosen for this Array_2D in a way that would be considered "too low" - negative value too large or negative exponent too large to be represented.

<Object_Statistics>

OPTIONAL

This class provides a place for statistical values calculated from the real data values of the pixels in the array. Every attribute in this class is optional. If you don't need any of the statistics, don't include this class in your Array_2D_*.

<local_identifier>

OPTIONAL

If you need to refer to this specific set of Object_Statistics from elsewhere in this label, this is the place to attach an identifier to it. If your identifier looks like a variable name in a typical programming language, you should be OK.

<maximum>

OPTIONAL

Maximum real data value found in the array as it exists in its file. That is, after any flag values identified in the corresponding <Special_Constants> class are ignored and any relevant bit mask is applied, but before offset or scaling_factor are applied.

<minimum>

OPTIONAL

Minimum real data value found in the array it exists in its file. That is, after any flag values identified in the corresponding <Special_Constants> class are ignored and any relevant bit mask is applied, but before offset or scaling_factor are applied.

<mean>

OPTIONAL

This is the arithmetic mean of the values in the array, excluding those elements containing flag values defined in the associated <Special_Constants> class, in the same units as the element. Any bit mask is applied before the calculation, but offset and scaling factor are not.

<standard_deviation>

OPTIONAL

This is the standard deviation of the <mean>, excluding those elements containing flag values defined in the associated <Special_Constants> class, in the same units as the element. Bit mask is applied; offset and scaling factor are not.

<median>

OPTIONAL

This attribute contains the median value of the real data values (excluding flag values) in the array, in the same units as the element. Any bit mask is applied prior to determining the median, but offset and scaling factor are not.

<maximum_scaled_value>

OPTIONAL

This is the maximum observational value represented in the array. Flag values are excluded; bit mask, scaling factor, and offset are all applied before determining this value.

<minimum_scaled_value>

OPTIONAL

This is the minimum observational value represented in the array. Flag values are excluded; bit mask, scaling factor, and offset are all applied before determining this value.

<description>

OPTIONAL

If you need to provide any additional information or caveats about the statistics, this is the place to do it.