Difference between revisions of "Filling Out the Array 2D Data Structure"

From The SBN Wiki
Jump to navigation Jump to search
Line 106: Line 106:
  
 
This attribute must contain the number of elements along this axis of the array.  For example, if the ''Array_2D'' in question has dimensions 112x256, then the ''<elements>'' value in the first ''<Axis_Array>'' would be "112".
 
This attribute must contain the number of elements along this axis of the array.  For example, if the ''Array_2D'' in question has dimensions 112x256, then the ''<elements>'' value in the first ''<Axis_Array>'' would be "112".
 
=== <unit> ===
 
 
''OPTIONAL''
 
 
If there is a unit associated with this axis, here's the place to put it.  Once again, the intention is to provide a string to use in labelling axes in a display.
 
  
 
=== <sequence_number> ===
 
=== <sequence_number> ===

Revision as of 16:21, 21 July 2014

The <Array_2D> class is the generic base on which all the specific Array_2D_* classes are built. Use this class only when one of the more specific flavors cannot be reasonably applied.


For additional explanation, see the PDS4 Standards Reference, or contact your PDS node consultant.

Following are the attributes and subclasses you'll find in <Array_2D>, in label order.

Note that in the PDS4 master schema, all classes have capitalized names; attributes never do.

<name>

OPTIONAL

This attribute can be used to give a descriptive name to the array.

<local_identifier>

OPTIONAL

If you need to reference this array class from somewhere else in the label (like in a <Display_Settings> class describing the correct display orientation), use this attribute to define a local identifier to use as a hook. Since nearly all arrays should have display information included, you should get in the habit of providing <local_identifier> attributes for all array-type objects. Follow the rules for naming a variable in a typical programming language and you should be OK.

<offset>

REQUIRED

This is the offset in bytes from the beginning of the file containing the array data to the beginning of the array. You must specify a unit of "byte" for this attribute, thus:

    <offset unit="byte">0</offset>

<axes>

REQUIRED

This attribute is required to be present and must have a value of "2".

<axis_index_order>

REQUIRED

This attribute is required to be present and must have a value of Last Index Fastest."Last" is with respect to the <sequence_number> values in the <Axis_Array&gt classes.

<description>

OPTIONAL

This is a place where additional description can be included, if desired.

<Element_Array>

REQUIRED

This class defines the attributes of the array element.

<data_type>

REQUIRED

The value here must be one of the binary numeric types from the list in the Standard Values Quick Reference.

<unit>

OPTIONAL

If there is a unit of measure associated with the array element values, use this attribute to specify it.

If the data are unitless, DO NOT INCLUDE THIS ATTRIBUTE!. The SBN will not accept data sets containing these or the equivalent:

    <unit/>
    <unit>N/A</unit>

<scaling_factor>

OPTIONAL

If the data have been scaled (divided by a constant), put the scaling factor in this attribute.

When reading the data, the value is multiplied by the <scaling_factor> value before adding the <offset> value.

<value_offset>

OPTIONAL

If an offset has been subtracted from the data, put the offset value in this attribute. Offsets may be positive or negative.

When reading the data, the value is first multiplied by<scaling_factor>, then the <offset> value is added.

<Axis_Array>

REQUIRED

This class describes one dimension of the two-dimensional array. There must be exactly two instances of this class in any Array_2D_* object.

<axis_name>

REQUIRED

This is the name of the array axis being described. The axis_name is typically something like "Wavelength" or "Distance". The value should be useful for labelling the axis in a display.

Note that for some data structures derived from the <Array_2D> structure, the names of the axes may be fixed.

<elements>

REQUIRED

This attribute must contain the number of elements along this axis of the array. For example, if the Array_2D in question has dimensions 112x256, then the <elements> value in the first <Axis_Array> would be "112".

<sequence_number>

REQUIRED

This number defines an order for the axes so that the <axis_index_order> value can be interpreted correctly for this Array. One of the axes must have a sequence_number of "1", the other "2". It is not necessary that the first Axis_Array class in the label have <sequence_number> equal to 1, but it would tend to make life easier for reviewers and users.

<Band_Bin_Set>

OPTIONAL

This class provides an explicit definition of each bin (or point) along the axis. It is unlikely to be present as part of the Axis_Array class in the next release of the Information Model and Schemas, however, as a broader solution to the axis labelling problem is in active development. The result will likely be one or more classes that will appear in the <Discipline_Area> of the label, rather than as a subclass of the <Axis_Array> class.

So, use this class if it is appropriate, but expect that you will have to rewrite those labels after the next PDS4 release (most likely in late 2013).

<Special_Constants>

OPTIONAL

Use this class to define any flag values that appear in the data to indicate drop outs, saturation, and other conditions that render a single pixel unknown. Every attribute in this class is optional. If you don't need any of the special constants, don't include this class in your Array_2D_*.

<saturated_constant>

OPTIONAL

This value indicates the data value was lost because of detector saturation.

<missing_constant>

OPTIONAL

This value indicates the data value is known to be missing for some reason not covered by the other constants available in this class.

<error_constant>

OPTIONAL

This value indicates the data value originally reported was known to be in error for some reason, and was replaced by this flag.

<invalid_constant>

OPTIONAL

This value indicates the data value originally recorded or calculated was outside the valid range for array elements.

<unknown_constant>

OPTIONAL

This value indicates the data value in this file is unknown because it was unknown in the source and cannot be recovered.

<not_applicable_constant>

OPTIONAL

This value indicates that the concept underlying the datum is not applicable in a particular context.

<valid_maximum>

OPTIONAL

This value is the maximum possible observational value that might be in the data. This is useful if your flag values are greater than this value and you want to simplify the exclusion logic.

<high_instrument_saturation>

OPTIONAL

This value indicates the original datum was in the high-end saturation range of the instrument.

<high_representation_saturation>

OPTIONAL

This value is used to indicate that, while the original observed value was valid, it is out of range of the numeric format chosen for this Array_2D in a way that would be considered "too high" - absolute magnitude too great, positive value too large, or positive exponent too large to be represented.

<valid_minimum>

OPTIONAL

This value is the minimum possible observational value that might be in the data. This is useful if your flag values are less than this value and you want to simplify the exclusion logic.

<low_instrument_saturation>

OPTIONAL

This value indicates the original datum was in the low-end saturation range of the instrument.

<low_representation_saturation>

OPTIONAL

This value is used to indicate that, while the original observed value was valid, it is out of range of the numeric format chosen for this Array_2D in a way that would be considered "too low" - negative value too large or negative exponent too large to be represented.

<Object_Statistics>

OPTIONAL

This class provides a place for statistical values calculated from the real data values of the pixels in the array. Every attribute in this class is optional. If you don't need any of the statistics, don't include this class in your Array_2D_*.

<local_identifier>

OPTIONAL

If you need to refer to this specific set of Object_Statistics from elsewhere in this label, this is the place to attach an identifier to it. If your identifier looks like a variable name in a typical programming language, you should be OK.

<maximum>

OPTIONAL

Maximum real data value found in the array as it exists in its file. That is, after any flag values identified in the corresponding <Special_Constants> class are ignored and any relevant bit mask is applied, but before offset or scaling_factor are applied.

<minimum>

OPTIONAL

Minimum real data value found in the array it exists in its file. That is, after any flag values identified in the corresponding <Special_Constants> class are ignored and any relevant bit mask is applied, but before offset or scaling_factor are applied.

<mean>

OPTIONAL

This is the arithmetic mean of the values in the array, excluding those elements containing flag values defined in the associated <Special_Constants> class, in the same units as the element. Any bit mask is applied before the calculation, but offset and scaling factor are not.

<standard_deviation>

OPTIONAL

This is the standard deviation of the <mean>, excluding those elements containing flag values defined in the associated <Special_Constants> class, in the same units as the element. Bit mask is applied; offset and scaling factor are not.

<bit_mask>

OPTIONAL

For values not aligned on word boundaries, this attribute contains the bit mask used to recover the value from the words after reading them into memory. Bit masks are formulated as a simple string of ones and zeroes. For example:

    <bit_mask>00011111</bit_mask>

Bit masks are applied before scaling factors and offsets, using the standard bitwise-and logical operation.

Notes: Obviously, a bit mask isn't a "statistic". This belongs in the array element definition class, not here, as it is essential to being able to read the data properly. And yet, here it remains...

Bit masks in general make it much more difficult to access and process the data, because each value must be carefully manipulated (taking into account byte order issues) before it can be stored into programmatic memory. Avoid them in SBN data unless absolutely, positively necessary. And then don't use them.

<median>

OPTIONAL

This attribute contains the median value of the real data values (excluding flag values) in the array, in the same units as the element. Any bit mask is applied prior to determining the median, but offset and scaling factor are not.

<md5_checksum>

OPTIONAL

This is the checksum of just the array data (that is, it might be a checksum of part of a file), calculated using the MD5 algorithm. To calculate this checksum the data comprising of the array must be treated as a simple sequence of bytes as they come from the file. So no bit masks, offsets, scaling factors, or even byte-swapping are applied.

For the hex digits in the value, you must use the lowercase letters a-f.

<maximum_scaled_value>

OPTIONAL

This is the maximum observational value represented in the array. Flag values are excluded; bit mask, scaling factor, and offset are all applied before determining this value.

<minimum_scaled_value>

OPTIONAL

This is the minimum observational value represented in the array. Flag values are excluded; bit mask, scaling factor, and offset are all applied before determining this value.

<description>

OPTIONAL

If you need to provide any additional information or caveats about the statistics, this is the place to do it.