Notes for Labelling FITS files

From The SBN Wiki
Jump to navigation Jump to search

FITS tables, images and arrays are all compliant as they are with the PDS4 data formatting standards. Follow the guidelines below for filling in the PDS4 attribute values.

Special Note About Signed/Unsigned Integers

There seems to be an occasional misunderstanding about Table 19 in the FITS standard document that affects both arrays and binary tables.

The only integer data types stored in a compliant FITS file are unsigned 8-bit bytes and signed 2-, 4-, and 8-byte integers. Table 19 of the FITS standard does give the correct offset for converting each of these types to their signed and unsigned counterparts, respectively, provided you have properly converted the numbers from one hardware storage type to the other. It is not true that you can, for example, just store unconverted unsigned 2-byte integers in your FITS file, add the offset from Table 19, and expect to get the correct (unsigned) value back from a compliant FITS reader.

The table in the FITS standard assumes that before your write your FITS file you properly convert unsigned integers to signed integers by subtracting the offset value given and converting to a signed hardware data type before you write your data into the FITS file. If you have not done this, then the offset required to recover the original unsigned value is, in fact, twice the offset value shown in that table if the value read in is negative, and zero if it is positive. (And conversely going from signed to unsigned bytes.)

So, if you are migrating a PDS3-labeled FITS file and you see a DATA_TYPE keyword in the PDS3 label describing a one-byte field as an "MSB_INTEGER", or a multi-byte field as an "MSB_UNSIGNED_INTEGER", you should be extremely skeptical about that data type description in the PDS3 label, and possibly in the FITS header as well. Make sure that the offsets are consistent with the values expected and found in the file, and report any discrepancies you discover.

In all cases the PDS4 <data_type> must be the data type used to read the values in the file, and not the data type you might ultimately want to end up with. That conversion is a private matter between the end user and his software.

Headers

Use the PDS4 Header object to describe any type of FITS header.

  • The <object_length> must be the number of FITS blocks comprising the header * 2880 (the size of a single FITS block).
  • For <parsing_standard_id>, use "FITS 3.0".

Arrays

This category includes both the primary data array, following the first FITS header in the file, and the data in any IMAGE extension. Use &ltArray_2D_Image> for any 2D data that can be reasonably considered an image. For other data, use the most specific and relevant flavor of Array_* available.

Axis Ordering

FITS array data are are stored such that the NAXIS1 subscript varies fastest, NAXIS2 next fastest, and so on up to NAXISn. So the storage order is first-index-fastest in FITS notation.

In PDS, arrays are stored so that axis 1 (the axis described by the <Axis_Array> that has <sequence_number> of "1") varies least rapidly, axis 2 next least rapidly, and so on to axis n.

So, when labeling a FITS array as a PDS Array-type object, the highest-numbered NAXIS* becomes axis 1 in the PDS array, the next-highest NAXIS* becomes axis 2, and so on.

Array Element Description

In the FITS primary data header or image extension header, the following reserved FITS keywords also have direct PDS4 equivalents in the <Element_Array> class.

FITS PDS4 Array
BSCALE <Element_Array>/<scaling_factor>
BZERO <Element_Array>/<value_offset>
BUNIT <Element_Array>/<unit>
BLANK
integer data only
see below
DATAMAX <Object_Statistics>/<maximum_scaled_value>
DATAMIN <Object_Statistics>/<minimum_scaled_value>

The BLANK keyword, when used according to the FITS standard, may only be used with integer data, and may either be a blank (if BSCALE is 1 and BZERO is 0), otherwise it contains a flag value indicating null data. If you need to specify a null value, add the <Special_Constants> class to the Array class, and select the appropriate special constant for the case in hand. Note that the BLANK value is the value read from the file - before any scale factor and offset are applied.

Scalar Data Types

In both the primary data array and IMAGE extensions, the data type is indicated by the value of the BITPIX keyword.

BITPIX Value PDS Data Type(s)
8 UnsignedByte -or- 7-bit ASCII character
16 SignedMSB2 (integer)
32 SignedMSB4 (integer)
-32 IEEE754MSBSingle (float)
64 SignedMSB8 (integer)
-64 IEEE754MSBDouble (float)


2D Images

In addition to the above correspondences, the PDS4 Array_2D_Image class has the following additional correspondences.

  • NAXIS1 corresponds to axis 2, and axis 2 must have an <axis_name> of "Sample".
  • NAXIS2 corresponds to axis 1, and axis 1 must have an <axis_name> of "Line".

Also note that the FITS standard says nothing about display direction. It seems to be universally true that samples (NAXIS1) are always drawn left-to-right; but lines are drawn either top-to-bottom or bottom-to-top with roughly equal frequency. You will have to look at your FITS images and determine the correct order of display.

The display order is specified in the <Display_2D_Image> class. It is optional according to the PDS master schema, but the SBN will always require that you define the display orientation of your images.


Note: It seems highly likely that the <Display_2d_image> class will not be in later versions of the PDS4 schemas, and will instead be replaced by a discipline-level class to provide the same information in a more general context that can be applied to both 2D and higher-dimensional arrays. Use it anyway for now, and SBN will provide a script to convert Display_2D_Image to whatever the new class is once a new master schema is released.

Binary Tables

Use a <Table_Binary> class for FITS BINTAB tables.

Note that the FITS standard requires that the BINTAB data be padded with null ("\0") from the end of the last record to the end of the containing 2880-byte FITS block. There is no requirement to document these padding bytes in the PDS4 label, and they are generally ignored.

Field Descriptions

In the FITS BINTAB extension, the field description keywords, for the most part, translate directly to PDS4 Field_Binary class attributes. Note that the only required keyword for each field is TFORM:

FITS PDS4 Field_Binary
TTYPE <name>
TUNIT <unit>
TSCAL <scaling_factor>
TZERO <value_offset>
TNULL
integer data only
see below
TDISP see below
TFORM <data_type>
see below

TNULL is only properly used for integer data. It indicates a null data flag, and corresponds to the value found in the file - before scaling or offset are applied. It can be translated into one of the specific flags in the <Special_Constants> class, depending on the circumstances of the data.

TDISP provides a recommended print format for the binary data, using a limited set of FORTRAN-like specifiers. In PDS4 the specifiers follow the POSIX standard and are described on the PDS4_field_format_Conventions page.

Note: The TDISP value may include a repetition specifier (as the first number in the string). If this number is greater than one, the corresponding TFIELD must be described in the PDS4 label as a <Group_Field_Binary>, containing just one <Field_Binary> (as described by the rest of the TFIELD definition), and a <repetitions> value equal to the repetition specifier from the corresponding TDISP keyword. Alternately you can turn the one TFIELD definition into repetition count <Field_Binary> definitions without using a <Group_Field_Binary> class.

TFORM is used in a BINTAB table to define the storage data type of the associated field. These and their PDS4 equivalents are enumerated below.

Note: In the BINTAB extension, there is no way to specify the location of a particular field as an offset from the beginning of the record, as the <field_location> attribute does for PDS4 files. This means two things:

  1. Every byte in the BINTAB record must be included in one of the defined fields - there is no undeclared gutter space, spare bytes, or record delimiters in FITS BINTAB data. On top of that, the fields must be defined in the order in which they appear in the file.
  2. You will have to calculate the <field_location> for each field in the record by adding up the total sizes of all preceeding fields. Don't forget the repetition counts in your TFORM fields in your location math!

Scalar Data Types

TFORM will have to be translated to the PDS4 equivalent. The FITS BINTAB scalar types that are directly supported by PDS4 binary tables are:

FITS Type Letter PDS4 <data_type>
B SignedByte
I UnsignedMSB2
J UnsignedMSB4
K UnsignedMSB8
A ASCII_String
(or any other appropriate ASCII_* type)
E IEEE754MSBSingle
D IEEE754MSBDouble

"Other appropriate ASCII_* types" include things like ASCII_Date_Time for strings representing times in the standard format, for example.

Complex, Vector, and Array Fields

In addition to the scalar types, the FITS standard allows a field to contain a complex number, a repeating value (basically, a vector), or an array descriptor. In PDS4, complex types are not directly supported. Simple repeating values (vectors) can be represented using a <Group_Field_Binary>. Arrays probably can as well, but the method has not yet been defined. So if your FITS binary table contains complex values and array descriptors, contact your PDS consultant ASAP for additional instructions.

ASCII Tables

FITS character tables will always appear as TABLE extensions. Only 7-bit ASCII characters are permitted in TABLEs, and the only non-printable character permitted in FITS or PDS4 character tables are the blank character and the carriage-return and linefeed characters, which may only be used as record delimiters.

Note that the FITS standard requires that the TABLE data be padded with blanks to an even number of FITS 2880-byte blocks. There is no requirement that these padding characters be included in the PDS4 label, and they're generally ignored.

In the TABLE extension header:

  • NAXIS1 corresponds to the <record_length> attribute in the <Record_Character> class.
  • NAXIS2 corresponds to the <records> attribute in the <Table_Character> class.
  • TFIELDS corresponds to the <fields> attribute in the <Record_Character> class. (Unlike in BINTAB tables, ASCII tables may not have fields that are arrays, vectors, or complex numbers.)
The attribute <group_fields> or something similar may be required in later releases. For TABLE extensions, this attribute will always have a value of "0" (zero).

Field Descriptions

Each field must have a corresponding TBCOL and TFORM keyword in the FITS TABLE header. The keywords used to define the fields have close analogs in the PDS4 <Field_Character> class:

FITS PDS4 Field_Character
TBCOL <field_location>
TFORM <data_type>
see following
TTYPE <name>
TUNIT <unit>
TSCAL <scaling_factor>
TZERO <value_offset>
TNULL see below
TDISP see below

TNULL is a string used as a null data flag for the field and corresponds to the value found in the file - before scaling or offset are applied. It can be translated into one of the specific flags in the <Special_Constants> class, depending on the circumstances of the data.


TDISP is is an optional keyword that can be used to contain a more specific display format than that implied by the TFORM keyword.

Scalar Data Types

TFORM is required for every field to indicate data type. It has a syntax similar to a FORTRAN format specifier, with a type specifier followed by a total field width ('w') and, for real values, a precision ('d'). Unlike to keyword of a similar name in BINTAB extensions, TFORM values in TABLE extensions may not contain repetition counts in a compliant FITS file. [Note: Non-compliant FITS files have been known...]


The correspondence to PDS4 <data_type> is straightforward for numeric types:

TFORM PDS4 <data_type>
Aw ASCII_String
(or more specific ASCII_* string type)
Iw ASCII_Integer
Fw.d ASCII_Real
Ew.d ASCII_Real
Dw.d ASCII_Real

"More specific ASCII_* string types" include, for example, ASCII_Date_Time for date/time fields conforming the the standard ISO 8601 format.