Filling Out the Table Character Data Structure

From The SBN Wiki
Revision as of 00:00, 26 January 2013 by Raugh (talk | contribs)
Jump to navigation Jump to search

The <Table_Character> class contains the data structure definition for a character table. Each row in the table has the same structure, defined in a Record_Character class. Records are themselves composed of scalar fields, or sub-records called "grouped fields".

For additional explanation, see the PDS4 Standards Reference, or contact your PDS node consultant.

Following are the attributes and subclasses you'll find in <Table_Character>, in label order.

Note that in the PDS4 master schema, all classes have capitalized names; attributes never do.

<name>

OPTIONAL

If you'd like to give this table a descriptive name, here's the place to do it.

<local_identifier>

OPTIONAL

If you want to reference this Table_Character from somewhere else in this label, give it a formal label here. Use an identifier that would make a valid variable name in a typical programming language, and you should be OK syntactically.

<offset>

REQUIRED

The offset, in bytes, from the beginning of the file holding the table data to the first character in the table. You must specify the unit for this keyword, like this:

    <offset unit="byte">0</offset>

<records>

REQUIRED

This is the total number of records in the table. Note that in a Table_Character table, each record must have carriage-return/linefeed record delimiter (including the last record).

<encoding_type>

REQUIRED

This must have the value Character.

<description>

OPTIONAL

This attribute provides a place for additional free-format text comments.

<record_delimiter>

REQUIRED

This attribute must have the value carriage_return line_feed. The data file must also have carriage-return/linefeed record delimiters, of course.

<Uniformly_Sampled>

OPTIONAL

If this Table_Character contains records which are uniformly spaced in some dimension (time, wavelength, distance, etc.), you can use this class to define that dimension and interval rather than including an additional field in each row to hold the value explicitly.

Note: There are certain types of data where this class can prevent a very large data file from increasing in size by %50. Such tables are predominantly used by software, not human readers scanning them by eye. For SBN data, our users generally prefer to see the additional column. So unless the two conditions above apply to your data, you should include the extra column.

<sampling_parameter_name>

REQUIRED

The name of the dimension of sampling (wavelength, time, etc.)

<sampling_parameter_interval>

REQUIRED

Distance between records in units of the sampling parameter. So if you're sampling in time, the interval might be 100 milliseconds, for example.

<sampling_parameter_unit>

REQUIRED

The unit associated with the sampling_parameter_interval.

<first_sampling_parameter_value>

REQUIRED

The value of the sampling parameter at the point where the data of the first record were recorded.

<last_sampling_parameter_value>

REQUIRED


The value of the sampling parameter at the point where the data of the last record were recorded.

<sampling_parameter_scale>

OPTIONAL

This is actually the type of the scale. It must be one of the standard values Exponential, Linear, or Logarithmic.


<Record_Character>

REQUIRED

This class describes the structure of one complete record in the table.

<fields>

REQUIRED

The number of fields in the Record.

Note: This is some controversy about what this value should be when Group Fields are present. For the time being, use the data dictionary definition - the number of fields in the record is the total number of scalar values in the record; so it is the sum of all Field definitions in the record, and the number of Field definitions in each Group Field multiplied by the repetitions value of that group.

<record_length>

REQUIRED

The total length of the record, including all fields, all repetitions of group fields, any space between fields, and the record delimiter. You must specify a unit of bytes for this value:

    <record_length unit="byte">1234</record_length>
Records are composed of Fields and Group Fields. A Record must have at least one of those, and can have an arbitrary number of them, in any order (that is, you can have Fields and Group Fields interspersed).
There are currently serious arguments underway associated with using Group Fields, in particular when attempting to determine the correct value for the required <fields> attribute, above. Because of this, SBN data preparers should not use Group Fields until the disagreements have been solved. Group Fields are never necessary - they are a notational convenience to save writing out large numbers of similar Field definitions.

<Field_Character>

The class defines a single scalar field.

<name>

REQUIRED

The name of the field. SBN recommends that this be something fairly human-readable that can be easily turned into a variable name for use in applications, or displayed as a meaningful column heading.

<field_number>

OPTIONAL

This is the sequential number of the Field definition. This is poorly defined when Group Fields are present, and should probably not be used at all in that case. The field_number is intended to be a help to human readers trying to map field definitions to columns in a print-out of the Table.

<field_location>

REQUIRED

This is the location, in bytes, of the first character in the field. (Note that locations begin with one, rather than zero.) You must indicate a unit of bytes for this field:

    <location unit="byte">1</offset>

<data_type>

REQUIRED

The type of the values in the field. This must be one of the values listed in the Standard Values Quick Reference.

<field_length>

REQUIRED

The length of the field, in bytes. You must specify the unit:

    <field_length unit="byte">12</field_length>

<field_format>

OPTIONAL

The value of this attribute is a string representing the read/print format for the data in the field, using a subset of the POSIX print conventions.

Note: The syntax of the content of this field is poorly defined in the current data dictionary.

The SBN has defined a subset of the POSIX standard for use in SBN data sets on the PDS4_field_format_Conventions page.

SBN will require that this attribute be present in all Field definitions. It is used for validation of the Table contents.

<unit>

OPTIONAL

If the value in this field has an associated unit, this is where it goes. This value is case sensitive, and you may use characters from the UTF-8 character set (like the Angstrom symbol) where appropriate.

Note: If a field contains a unitless value, then there should be no <unit> attribute. NEVER include a null unit value, or even worse, this: <unit>N/A</unit>.

<scaling_factor>

OPTIONAL

If the data in this field are scaled, this attribute should contain the value the data must be multiplied by to get back to the original value. Scaling factors are applied prior to adding any offset.

<value_offset>

OPTIONAL

If the values in the field have been shifted by an offset, this attribute should contain the value that must be added to each field value to get back to the original value. Offsets and added after the scaling factor, if any.

<description>

OPTIONAL

Free-format text describing the content of the field.

Note: While not required, SBN expects to see a useful definition for every Field, as do both reviewers and users. Omit this field at your peril.

<Special_Constants>

OPTIONAL

This class defines flag values used to indicate that a particular field value is unknown for one reason or another. It is identical to the <Special_Constants> class used in the Array classes. For details, check the Filling Out the Array 2D Data Structure - <Special_Constants> page. Here is a quick list of the special constants available in this class:

  • saturated_constant
  • missing_constant
  • error_constant
  • invalid_constant
  • unknown_constant
  • not_applicable_constant


<Field_Statistics>

OPTIONAL

If you want to include things like extrema, mean value, and such for all the values that occur in this field through all the records in the table, this is the place to do it. This class is identical for all Field types. For details, see Filling Out the Field Statistics Class. Here is a quick list of the field statistics available in this class:

  • maximum
  • minimum
  • mean
  • standard_deviation
  • median


<Group_Field_Character>

This class defines a set of Fields and Group Fields that repeats a given number of times in each record. Group Fields may be nested.

Note: Unless you have three good reasons, don't use Group Fields in for SBN data.

<repetitions>

REQUIRED

The number of times the complete set of Fields and Group Fields comprising this <Group_Field_Character> repeats.

Note: The minimum value for this field listed in the data dictionary is one, but no product will pass SBN review unless this value is at least two.

<fields>

REQUIRED


This is not well-defined, and is the subject of debate as I type this.

Note: The current data dictionary definition of this value is actually the definition for Record_Character, which is at the least grammatically problematic.

<group_location>

REQUIRED

This is the location of the first byte of the first field of the first repetition of this group. You must specify a unit of "byte" for this value. If the group starts at the beginning of the containing Record or Group, this has a value of one. For example:

    <group_location unit="byte">1</group_location>
Note: The data dictionary definition for this attribute in Group_Field_Character references the wrong class name. Ignore it.

<Fields> and Nested <Group_Fields>

As in the Record, the Group may contain either Fields, or Group_Fields, or both intermixed. Group_Fields may be nested arbitrarily deeply. The requirement for these data structure classes inside a <Group_Field_Character> are identical to those above.