Python PDS4 Tools

From The SBN Wiki
Revision as of 20:01, 9 June 2015 by Lnagdi1 (talk | contribs) (Updated docstring)
Jump to navigation Jump to search

Introduction

This document describes the current status and usage of tools developed at PDS-SBN to read and visualize PDS4 data in Python. Please note that a more feature-complete PDS4 reader and visualizer is also available in IDL here.

Reading PDS4 Images

Introduction

An example Python module that can read and display an image from a PDS4 data product is available. The code will read the data based on the label keywords, but does not otherwise validate the label. If the user wants to display the image, the code will consider the label's Display_Settings, and provide a copy of the image in the correct orientation for drawing with the origin in the lower left corner. The code below is designed for reading BOPPS BIRC images, but can be used as an example for other limited problems. A more general solution will likely use a different approach.

Contact Mike Kelley with questions or comments regarding this code or its description.

Requirements

This example assumes the user is running Python 2.7, with a recent NumPy package installed. The visualization example uses matplotlib.

Goal and Method

The goal is to read in an image from a BOPPS BIRC data product into a Numpy array, providing the correct orientation for display. We will provide a function with the name of the label, the function will then

  1. Open the label.
  2. Find the data product file name.
  3. Determine the Array_2D_Image data type and shape.
  4. Read in the data array.
  5. Return the array and meta data in a single object.

The object will have two attributes that allow access to the data

  1. the data with the axis order and orientation as provided in the file, and
  2. the data with the axis order and orientation reconfigured according to the label's Display_Settings class, so that it will have the correct orientation if drawn with the origin in the lower left corner.

Implementation Details

For this basic example, we designed the reader as a function in a module named birc_example_reader. The user calls a single function, birc.read_image(), passing the name of the label as the first argument. The function will load the label using the ElementTree module and find the first Array_2D_Image element to read in. A second function, read_pds4_array(), determines the correct data type and shape, then reads the data from the file. A class specifically designed for PDS4 Array_2D_Image objects, aptly named PDS4_Array_2D_Image, is initialized with the data, the label describing the data, and the local_identifier of the array. The local_identifier is not normally required in PDS4 array objects, but it must be present when the image display orientation is provided via Display_Settings. Since these are present in the BIRC labels, our class assumes local_identifier is included. The class then determines the image orientation. The image is stored as a class attribute data. The class attribute display_data is also provided, which can be used for displaying with the origin in the lower left corner.

Download File:Birc example reader.zip.

Minimal Working Example

Rather than list the birc_example_reader.py here, below we provide a minimal working example with the same basic functionality. The example is a flat script with extensive comments, which may more clearly illustrate some of the methods for working with PDS4 image labels.

birc_mwe.py

"""
birc_mwe --- Minimal working example to read and orient BIRC images
===================================================================

Execute this script:
  * On the command line: python birc_mwe.py
  * In IPython: run birc_mwe.py

The data will be in a variable named `data`.  The data for display
(origin in the lower left) will be in a variable named `display_data`.

Little to no error checking or label validation is done for this
example.

"""

# required modules
import os
import xml.etree.ElementTree as ET
import numpy as np
import matplotlib.pyplot as plt

# The PDS4 label file name
label_name = 'output/rawFitsFrame/cerh2_1_010000_rb_n169_n011.xml'

# XML namespace definitions
ns = {'pds4': 'http://pds.nasa.gov/pds4/pds/v1',
      'disp': 'http://pds.nasa.gov/pds4/disp/v1'}

# read in the label
label = ET.parse(label_name)

# Find the first File_Area_Observational element with an
# Array_2D_Image and assume it is what we want (OK assumption for BIRC
# test labels).
file_area = label.find('./pds4:File_Area_Observational/[pds4:Array_2D_Image]',
                       ns)

# Image file name, prefixed with the path to the label
file_name = file_area.find('./pds4:File/pds4:file_name', ns).text.strip()
file_name = os.path.join(os.path.dirname(label_name), file_name)

# Find the array class, the local identifier and data type of the
# array.  Transform the PDS4 data type into a NumPy data type (i.e.,
# dtype).
array = file_area.find('./pds4:Array_2D_Image', ns)
local_identifier = array.find('./pds4:local_identifier', ns).text.strip()
pds4_to_numpy_dtypes = {
    "IEEE754MSBSingle": '>f4'  # All BIRC example data is IEEE754MSBSingle
}
data_type = array.find('pds4:Element_Array/pds4:data_type', ns).text.strip()
dtype = np.dtype(pds4_to_numpy_dtypes[data_type])

# determine the array shape
shape = []
for i in [1, 2]:
    # find axis 
    k = './pds4:Axis_Array/[pds4:sequence_number="{}"]/pds4:elements'.format(i)
    shape.append(int(array.find(k, ns).text))

# read in the data
offset = array.find('./pds4:offset', ns)
with open(file_name, 'r') as inf:
    inf.seek(int(offset.text))
    data = np.fromfile(inf, dtype, count=np.prod(shape)).reshape(shape)

# Rotate the data into display orientation (origin in lower left).

# find display_settings_to_array for local_identifier in
# Display_Settings.  The BIRC sample labels have extra whitespace in
# the Display_Settings local_identifier_reference, so we cannot use a
# search similar to the one we did above for array shape with
# ElementTree's limited xpath support.
display_settings = None
xpath = ('./pds4:Observation_Area/pds4:Discipline_Area/disp:Display_Settings')
for e in label.findall(xpath, ns):
    k = './disp:Local_Internal_Reference/disp:local_identifier_reference'
    reference = e.find(k, ns).text.strip()
    if reference == local_identifier:
        display_settings = e
        break

# determine display directions
dd = display_settings.find('./disp:Display_Direction', ns)
h = dd.find('disp:horizontal_display_direction', ns)
v = dd.find('disp:vertical_display_direction', ns)
display_directions = (h.text.strip(), v.text.strip())
del h, v

# determine horizonal and vertical axis array indices
h_axis_name = dd.find('./disp:horizontal_display_axis', ns).text.strip()
for axis in array.findall('./pds4:Axis_Array', ns):
    if axis.find('./pds4:axis_name', ns).text.strip() == h_axis_name:
        horizonal_axis = int(axis.find('./pds4:sequence_number', ns).text) - 1

v_axis_name = dd.find('./disp:vertical_display_axis', ns).text.strip()
for axis in array.findall('./pds4:Axis_Array', ns):
    if axis.find('./pds4:axis_name', ns).text.strip() == v_axis_name:
        vertical_axis = int(axis.find('./pds4:sequence_number', ns).text) - 1

# Move the vertical axis to axis number 0
display_data = np.rollaxis(data, vertical_axis, 0)
if 'Right to Left' in display_directions:
    display_data = display_data[:, ::-1]
if 'Top to Bottom' in display_directions:
    display_data = display_data[::-1]

plt.clf()
plt.imshow(display_data, origin='lower')
plt.draw()
plt.show()

Reading PDS4 Tables

Intoduction

A Python package that can read and display PDS4 table data is available. Currently only Table_Character and Table_Binary objects are supported. In the future this tool is expected to support all PDS4 objects. The package expects labels that pass PDS4 Schema and Schematron validation, it will perform additional validation for both PDS4 Standards as well as optionally some PDS-SBN standards. A PDS4 data viewer is also available for supported objects.

Contact Lev Nagdimunov with questions or comments regarding this code or its description.

Requirements

This tool assumes the user is running Python 2.6 or 2.7. There are no additional requirements to read PDS4 data, although a recent NumPy package can be used if installed. To visualize data Tkinter is required; it is part of the standard Python distribution for most platforms.

Installation

Option 1

Download the ZIP file File:PDS4 tools-0.1.zip and extract it to a directory Python can find. To use it follow the instructions in Example Usage except with the following lines first,

import sys
sys.path.extend(['/path/to/your/extraction/directory'])

# Or
# On a windows machine remember to escape your backslashes

import sys
sys.path.extend(['C:\\path\\to\\your\\extraction\\directory'])

Option 2

Download the ZIP file File:PDS4 tools-0.1.zip. You can use "pip install PDS4_tools-0.1.zip" or "easy_install PDS4_tools-0.1.zip". You can also extract the ZIP file and use "python /path/to/extracted/setup.py install". Note that there is no uninstall script provided (you can use pip uninstall pds4_tools), and that this tool currently has many missing features and will be updated in the future.

Example Usage

You may also call pds4_read from command line:

usage: pds4_read.py [-h] [--quiet] [--use_numpy] [--object_num OBJECT_NUM]
                    [--object_name OBJECT_NAME] [--object_lid OBJECT_LID]
                    filename

positional arguments:
  filename              Filename, including full path, of the label

optional arguments:
  -h, --help            show this help message and exit
  --quiet               Suppresses all info/warnings
  --use_numpy           Returned data will be a numpy array and use numpy data types
  --object_num OBJECT_NUM
                        Only reads the data object specified by zero-based order (integer)
  --object_name OBJECT_NAME
                        Only reads the data object specified by name
  --object_lid OBJECT_LID
                        Only reads the data object specified by local identifier

You may also call it from another module or script. All the above optional arguments are available as optional named parameters of pds4_read(). Basic example usage is as follows,

""" Basic pds4_read example """

from pds4_tools import pds4_read

data = pds4_read('/path/to/label.xml')

# Dict-like access
column = data['table_name']['field_name']
row_1_to_100 = column[0:100]

# List-like access
column = data[0][0]
row_1_to_100 = column[0:100]

# Meta-data access
column_meta = data['table_name'].meta_data('field_name')
column_meta = data[0].meta_data(0)

print column_meta.description
print column_meta.unit

To display the objects in a label you may call pds4_viewer from command line:

usage: pds4_viewer.py [-h] [--quiet] [--object_num OBJECT_NUM]
                    [--object_name OBJECT_NAME] [--object_lid OBJECT_LID]
                    [filename]

positional arguments:
  filename              Filename, including full path, of the label

optional arguments:
  -h, --help            show this help message and exit
  --quiet               Suppresses all info/warnings
  --object_num OBJECT_NUM
                        Only reads the data object specified by zero-based order (integer)
  --object_name OBJECT_NAME
                        Only reads the data object specified by name
  --object_lid OBJECT_LID
                        Only reads the data object specified by local identifier

It is not necessary to include the filename parameter for PDS4 Viewer, you may simplify call it and a GUI will open from which you can open labels.

You may also call pds4_viewer from another module or script. All the above optional arguments are available as optional named parameters. A basic example usage is as follows,

""" Basic pds4_viewer example """

from pds4_tools import pds4_read, pds4_viewer

pds4_viewer()

# or

pds4_viewer('label.xml')

# or 

data = pds4_read('label.xml')
pds4_viewer('label.xml', from_existing_data=data) # Won't re-read the data