Filling Out the Array 2D Data Structure

From The SBN Wiki
Revision as of 18:12, 20 February 2013 by Lnagdi1 (talk | contribs)
Jump to navigation Jump to search

The <Array_2D> class is the generic base on which all the specific Array_2D_* classes are built. Use this class only when one of the more specific flavors cannot be reasonably applied.


For additional explanation, see the PDS4 Standards Reference, or contact your PDS node consultant.

Following are the attributes and subclasses you'll find in <Array_2D>, in label order.

Note that in the PDS4 master schema, all classes have capitalized names; attributes never do.

<name>

OPTIONAL This attribute can be used to give a descriptive name to the array.

<local_identifier>

OPTIONAL If you need to reference this array class from somewhere else in the label, use this attribute to define a local identifier to use as a hook. Follow the rules for naming a variable in a typical programming language and you should be OK.

<offset>

REQUIRED

This is the offset in bytes from the beginning of the file containing the array data to the beginning of the array. You must specify a unit of "byte" for this attribute, thus:

    <offset unit="byte">0</offset>

<axes>

REQUIRED This attribute is required to be present and must have a value of "2".

<axis_index_order>

REQUIRED This attribute is required to be present and must have a value of Last_Index_Fastest."Last" is with respect to the <sequence_number> values in the <Axis_Array&gt classes.

<encoding_type>

REQUIRED This attribute may have a value of either Binary or Character. The correct answer is almost certainly Binary. If you think you have a case of Character, contact your PDS consultant first.

<description>

OPTIONAL This is a place where additional description can be included, if desired.

<Element_Array>

REQUIRED This class defines the attributes of the array element.

<data_type>

REQUIRED The value here must be one from the list in the Standard Values Quick Reference.

<unit>

OPTIONAL If there is a unit of measure associated with the array element values, use this attribute to specify it.

If the data are unitless, DO NOT INCLUDE THIS ATTRIBUTE!. The SBN will not accept data sets containing these or the equivalent:
    <unit/>    <unit>N/A</unit>

<scaling_factor>

OPTIONAL If the data have been scaled (multiplied or divided by a constant), put the scaling factor in this attribute. When reading the data, <scaling_factor> is applied to the value before adding the <offset> value.

<value_offset>

OPTIONAL If an offset has been applied to the data, put the offset value in this attribute. Offsets may be positive or negative. When reading the data, <scaling_factor> is applied to the value before adding the <offset> value.

<Axis_Array>

REQUIRED This class describes one dimension of the two-dimensional array. There must be exactly to instances of this class in any Array_2D_* object.

<axis_name>

REQUIRED This is the name of the array axis being described. The axis_name is typically something like "Wavelength" or "Distance". The value should be useful for labelling the axis in a display.

<elements>

REQUIRED This attribute must contain the number of elements along this axis of the array. For example, if the Array_2D in question have dimensions 112x256, then the <elements> value in the first <Axis_Array> would be "112".

<unit>

OPTIONAL If there is a unit associated with this axis, here's the place to put it. Once again, the intention is to provide a string to use in labelling axes in a display.

<sequence_number>

REQUIRED This number defines an order for the axes so that the <axis_index_order> value can be interpreted correctly for this Array. One of the axes must have a sequence_number of "1", the other "2". It is not necessary that the first Axis_Array class in the label has <sequence_number> equal to 1, but it would tend to make like easier for reviewers and users.

<Band_Bin_Set>

OPTIONAL This placeholder class contains no attributes. Until it does, do not use it in SBN data sets. If you think you need it, contact your PDS consultant.

<Special_Constants>

OPTIONAL Use this class to define any flag values that appear in the data to indicate drop outs, saturation, and other conditions that render a single pixel unknown. Every attribute in this class is optional. If you don't need any of the special constants, don't include this class in your Array_2D_*.

<saturated_constant>

OPTIONAL This value indicates the data value was lost because of detector saturation.

<missing_constant>

OPTIONAL This value indicates the data value is known to be missing for some reason not covered by the other constants available in this class.

<error_constant>

OPTIONAL This value indicates the data value originally reported was known to be in error for some reason, and was replaced by this flag.

<invalid_constant>

OPTIONAL This value indicates the data value originally recorded or calculated was outside the valid range for array elements.

<unknown_constant>

OPTIONAL This value indicates the data value in this file is unknown because it was unknown in the source and cannot be recovered.

<not_applicable_constant>

OPTIONAL This value indicates that the concept underlying the datum is not applicable in a particular context.

<Object_Statistics>

OPTIONAL

This class provides a place for statistical values calculated from the real data values of the pixels in the array. Every attribute in this class is optional. If you don't need any of the statistics, don't include this class in your Array_2D_*.

<local_identifier>

OPTIONAL

If you need to refer to this specific set of Object_Statistics from elsewhere in this label, this is the place to attach an identifier to it. If your identifier looks like a variable name in a typical programming language, you should be OK.

<maximum>

OPTIONAL

Maximum real data value found in the array as it exists in its file. That is, any flag values identified in the corresponding <Special_Constants> class are ignored, and no offset or scaling_factor is applied before comparing values read from the file.

Note: The data dictionary says that empty fields are also ignored, but to my knowledge there is no way to define an array element as being empty without using a special constant. The data dictionary also does not indicate whether any bit_mask should be applied before comparing values to determine the maximum. There's also a reference to "repeating fields" that doesn't make any sense. Ignore it.

<minimum>

OPTIONAL

Minimum real data value found in the array it exists in its file. That is, any flag values identified in the corresponding <Special_Constants> class are ignored, and no offset or scaling_factor is applied before comparing values read from the file.

Note: The data dictionary says that empty fields are also ignored, but to my knowledge there is no way to define an array element as being empty without using a special constant. The data dictionary also does not indicate whether any bit_mask should be applied before comparing values to determine the maximum. There's also a reference to "repeating fields" that doesn't make any sense. Ignore it.

<mean>

OPTIONAL

This is the arithmetic mean of the values in the array, excluding those elements containing flag values defined in the associated <Special_Constants> class, in the same units as the element.

Note: The data dictionary does not specify whether this is the mean of the stored values, or the values after bit masks, scaling factors and offsets have been applied. There's also a reference to "repeating fields" that doesn't make any sense. Ignore it.

<standard_deviation>

OPTIONAL

This is the statistical standard deviation of the data values in the array, excluding those elements containing flag values defined in the associated <Special_Constants> class, in the same units as the element.

Note: The data dictionary does not specify whether this is the standard deviation of the stored values, or the values after bit masks, scaling factors and offsets have been applied. There's also a reference to "repeating fields" that doesn't make any sense. Ignore it.

<bit_mask>

OPTIONAL

For values not aligned on word boundaries, this attribute contains the bit mask used to recover the value from the words after reading them into memory. Bit masks are formulated as a simple string of ones and zeroes. For example:

    <bit_mask>00011111</bit_mask>

Bit masks are applied before scaling factors and offsets, using the standard bitwise-and logical operation.

Notes: Obviously, a bit mask isn't a "statistic". This belongs in the array element definition class, not here, as it is essential to being able to read the data properly.

The data dictionary doesn't actually say that bit masks are applied first, but nothing else makes any sense, really. The definition for the data type of bit_mask does not allow any syntax to flag that the value is a binary integer, nor does it constrain the value to have a length equal to the number of bits in the element to be masked. Neither does the Schematron file place any constraints on bit masks. Avoid them in SBN data unless absolutely, positively necessary. And then don't use them.

<median>

OPTIONAL

This attribute contains the arithmetic mean of the real data values (excluding flag values) in the array, in the same units as the element.

Note: The data dictionary does not specify whether this is the mean of the stored values, or the values after bit masks, scaling factors and offsets have been applied. There's also a reference to "repeating fields" that doesn't make any sense. Ignore it.

<md5_checksum>

OPTIONAL

This is the checksum of just the array (that is, it might be a checksum of part of a file), calculated using the MD5 algorithm. For the hex digits in the value, you must use the lowercase letters a-f.

<maximum_scaled_value>

OPTIONAL

This is the maximum value in the array, excluding flag values, after scaling factors and offsets have been applied to the values read in from the file. This value should be equal to the <maximum> value multiplied by the scaling_factor, added to the offset. Note, however, that you are not required to include both maximum and maximum_scaled_value.

Note: The data dictionary does not mention that bit masks must also be applied, but by this point you should have been expecting that.

<minimum_scaled_value>

OPTIONAL

This is the minimum value in the array, excluding flag values, after scaling factors and offsets have been applied to the values read in from the file. This value should be equal to the <minimum> value multiplied by the scaling_factor, added to the offset. Note, however, that you are not required to include both minimum and minimum_scaled_value.

Note: The data dictionary does not mention that bit masks must also be applied, but by this point you should have been expecting that.

<description>

OPTIONAL

If you need to provide any additional information or caveats about the statistics, this is the place to do it.