Difference between revisions of "Example Python Reader for PDS4 Images"

From The SBN Wiki
Jump to navigation Jump to search
m
(Depricated in favor of Python PDS4 Tools page.)
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
 +
== Deprecation Warning ==
 +
 +
This page has been superseded by [[Python PDS4 Tools]] and will be removed at a future date.
  
 
== Introduction ==
 
== Introduction ==
  
This document describes how to use Python to read in an image from a PDS4 data product.  The code will not validate the label, but will read the data based on the label keywords.  The code below is limited to reading BOPPS BIRC images, but can be used as an example for other arrays.
+
This document describes an example Python module that can read an image from a PDS4 data product.  The code will read the data based on the label keywords, but does not otherwise validate the label.  If the user wants to display the image, the code will consider the label's <tt>Display_Settings</tt>, and provide a copy of the image in the correct orientation for drawing with the origin in the lower left corner.  The code below is designed for reading BOPPS BIRC images, but can be used as an example for other limited problems.  A more general solution will likely use a different approach.
 +
 
 +
Contact [[User:Kelley|Mike Kelley]] with questions or comments regarding this page.
  
 
== Requirements ==
 
== Requirements ==
Line 10: Line 16:
 
== Goal and Method ==
 
== Goal and Method ==
  
The goal is to read in an image from a BOPPS BIRC data product into a Numpy array.  We will provide the script with the name of the label, the script will then
+
The goal is to read in an image from a BOPPS BIRC data product into a Numpy array, providing the correct orientation for display.  We will provide a function with the name of the label, the function will then
 
# Open the label.
 
# Open the label.
 
# Find the data product file name.
 
# Find the data product file name.
 
# Determine the Array_2D_Image data type and shape.
 
# Determine the Array_2D_Image data type and shape.
 
# Read in the data array.
 
# Read in the data array.
# Return the array.
+
# Return the array and meta data in a single object.
  
== Implementation ==
+
The object will have two attributes that allow access to the data
 +
# the data with the axis order and orientation as provided in the file, and
 +
# the data with the axis order and orientation reconfigured according to the label's <tt>Display_Settings</tt> class, so that it will have the correct orientation if drawn with the origin in the lower left corner.
  
For this basic example, we designed the reader as a function in a module named <tt>birc</tt>.  The user calls a single function, <tt>birc.read_image()</tt>, passing the name of the label as the first argument.  The function will load the label using the [https://docs.python.org/2.7/library/xml.etree.elementtree.html ElementTree] module and find the first <tt>Array_2D_Image</tt> element to read in.  A second function, <tt>read_pds4_array()</tt>, determines the correct data type and shape, then reads the data from the file.  A class specifically designed for PDS4 <tt>Array_2D_Image</tt> objects, aptly named <tt>PDS4_Array_2D_Image</tt>, is initialized with the data, the label describing the data, and the <tt>local_identifier</tt> of the array.  The <tt>local_identifier</tt> is not required, but is present in the BIRC labels, so our class assumes it is included.  The class then determines the image orientation.  The image is stored as a class attribute <tt>data</tt>.  The class attribute <tt>display_data</tt> is also provided, which can be used for displaying with the origin in the lower left corner.  The code is listed below.
+
== Implementation Details ==
  
== Example ==
+
For this basic example, we designed the reader as a function in a module named <tt>birc_example_reader</tt>.  The user calls a single function, <tt>birc.read_image()</tt>, passing the name of the label as the first argument.  The function will load the label using the [https://docs.python.org/2.7/library/xml.etree.elementtree.html ElementTree] module and find the first <tt>Array_2D_Image</tt> element to read in.  A second function, <tt>read_pds4_array()</tt>, determines the correct data type and shape, then reads the data from the file.  A class specifically designed for PDS4 <tt>Array_2D_Image</tt> objects, aptly named <tt>PDS4_Array_2D_Image</tt>, is initialized with the data, the label describing the data, and the <tt>local_identifier</tt> of the array.  The <tt>local_identifier</tt> is not normally required in PDS4 array objects, but it must be present when the image display orientation is provided via <tt>Display_Settings</tt>.  Since these are present in the BIRC labels, our class assumes <tt>local_identifier</tt> is included.  The class then determines the image orientation.  The image is stored as a class attribute <tt>data</tt>.  The class attribute <tt>display_data</tt> is also provided, which can be used for displaying with the origin in the lower left corner.
  
Using the <tt>birc.py</tt> module listed below, this example reads in an image, zeros out the first 10 rows of the array, then displays the data with <tt>matplotlib</tt>.
+
Download [[File:Birc_example_reader.zip]].
  
<pre>
+
== Minimal Working Example ==
import birc
 
import matplotlib.pyplot as plt
 
  
# the image is im.data
+
Rather than list the <tt>birc_example_reader.py</tt> here, below we provide a minimal working example with the same basic functionality.  The example is a flat script with extensive comments, which may more clearly illustrate some of the methods for working with PDS4 image labels.
# the image for displaying purposes is im.display_data
 
im = birc.read_image('cerh2_1_010000_rb_n169_n011.xml')
 
  
# zero out the first 10 rows
+
=== birc_mwe.py ===
im[:10] = 0
 
 
 
# display the result
 
plt.clf()
 
plt.imshow(im.display_data, origin='lower')
 
plt.draw()
 
</pre>
 
 
 
== birc.py ==
 
  
 
<pre>
 
<pre>
 
"""
 
"""
birc --- Example PDS4 Array_2D_Image reader for BOPPS/BIRC data
+
birc_mwe --- Minimal working example to read and orient BIRC images
===============================================================
+
===================================================================
 
 
Example
 
-------
 
  
import birc
+
Execute this script:
import matplotlib.pyplot as plt
+
  * On the command line: python birc_mwe.py
 +
  * In IPython: run birc_mwe.py
  
# array is im.data
+
The data will be in a variable named `data`. The data for display
# array for displaying is im.display_data
+
(origin in the lower left) will be in a variable named `display_data`.
im = birc.read_image('cerh2_1_010000_rb_n169_n011.xml')
 
  
plt.clf()
+
Little to no error checking or label validation is done for this
plt.imshow(im.display_data, origin='lower')
+
example.
plt.draw()
 
 
 
 
 
Classes
 
-------
 
PDS4_Array_2D_Image - A PDS4 2D image.
 
 
 
 
 
Functions
 
---------
 
read_image      - Read a BIRC image described by a PDS4 label file.
 
read_pds4_array - Read a PDS4 data array.
 
  
 
"""
 
"""
  
 +
# required modules
 +
import os
 +
import xml.etree.ElementTree as ET
 
import numpy as np
 
import numpy as np
import xml.etree.ElementTree as ET
+
import matplotlib.pyplot as plt
  
class PDS4_Array_2D_Image(object):
+
# The PDS4 label file name
    """A PDS4 array for 2D images, with limited functionality.
+
label_name = 'output/rawFitsFrame/cerh2_1_010000_rb_n169_n011.xml'
  
    Parameters
+
# XML namespace definitions
    ----------
+
ns = {'pds4': 'http://pds.nasa.gov/pds4/pds/v1',
    data : ndarray
+
       'disp': 'http://pds.nasa.gov/pds4/disp/v1'}
       The data.
 
    local_identifier : string
 
      The label's local_identifier for this array.
 
    label : ElementTree
 
      The PDS4 label that contains the description of the array.
 
  
    Attributes
+
# read in the label
    ----------
+
label = ET.parse(label_name)
    data : ndarray
 
      See Parameters.
 
    label : ElementTree
 
      See Parameters.
 
    local_identifier : string
 
      See Parameters.
 
  
    horizontal_axis : int
+
# Find the first File_Area_Observational element with an
      The index of the horizontal axis for display.
+
# Array_2D_Image and assume it is what we want (OK assumption for BIRC
    vertical_axis : int
+
# test labels).
      The index of the vertical axis for display.
+
file_area = label.find('./pds4:File_Area_Observational/[pds4:Array_2D_Image]',
 +
                      ns)
  
    display_data : ndarray
+
# Image file name, prefixed with the path to the label
      The data array rotated into display orientation, assuming the
+
file_name = file_area.find('./pds4:File/pds4:file_name', ns).text.strip()
      display will draw the image with the origin in the lower left
+
file_name = os.path.join(os.path.dirname(label_name), file_name)
      corner. The vertical axis will be axis 0, the horizontal axis
 
      will be axis 1.
 
  
     """
+
# Find the array class, the local identifier and data type of the
 +
# array.  Transform the PDS4 data type into a NumPy data type (i.e.,
 +
# dtype).
 +
array = file_area.find('./pds4:Array_2D_Image', ns)
 +
local_identifier = array.find('./pds4:local_identifier', ns).text.strip()
 +
pds4_to_numpy_dtypes = {
 +
     "IEEE754MSBSingle": '>f4'  # All BIRC example data is IEEE754MSBSingle
 +
}
 +
data_type = array.find('pds4:Element_Array/pds4:data_type', ns).text.strip()
 +
dtype = np.dtype(pds4_to_numpy_dtypes[data_type])
  
    def __init__(self, data, local_identifier, label):
+
# determine the array shape
        self.data = data
+
shape = []
        self.local_identifier = local_identifier
+
for i in [1, 2]:
        self.label = label
+
    # find axis
        self._orient()
+
    k = './pds4:Axis_Array/[pds4:sequence_number="{}"]/pds4:elements'.format(i)
 +
    shape.append(int(array.find(k, ns).text))
  
     def _orient(self):
+
# read in the data
        """Set object image orientation attributes."""
+
offset = array.find('./pds4:offset', ns)
 +
with open(file_name, 'r') as inf:
 +
     inf.seek(int(offset.text))
 +
    data = np.fromfile(inf, dtype, count=np.prod(shape)).reshape(shape)
  
        # namespace definitions
+
# Rotate the data into display orientation (origin in lower left).
        ns = {'pds4': 'http://pds.nasa.gov/pds4/pds/v1',
 
              'disp': 'http://pds.nasa.gov/pds4/disp/v1'}
 
  
        # find local_identifier in File_Area_Observational
+
# find display_settings_to_array for local_identifier in
        array = None
+
# Display_Settings.  The BIRC sample labels have extra whitespace in
        xpath = ('./pds4:File_Area_Observational/pds4:Array_2D_Image/'
+
# the Display_Settings local_identifier_reference, so we cannot use a
                '[pds4:local_identifier]')
+
# search similar to the one we did above for array shape with
        for e in self.label.findall(xpath, ns):
+
# ElementTree's limited xpath support.
            this_local_id = e.find('./pds4:local_identifier', ns).text.strip()
+
display_settings = None
            if this_local_id == self.local_identifier:
+
xpath = ('./pds4:Observation_Area/pds4:Discipline_Area/disp:Display_Settings')
                array = e
+
for e in label.findall(xpath, ns):
                break
+
    k = './disp:Local_Internal_Reference/disp:local_identifier_reference'
 +
    reference = e.find(k, ns).text.strip()
 +
    if reference == local_identifier:
 +
        display_settings = e
 +
        break
  
        assert array is not None, "Array_2D_Image with local_identifier == {} not found.".format(self.local_identifier)
+
# determine display directions
 +
dd = display_settings.find('./disp:Display_Direction', ns)
 +
h = dd.find('disp:horizontal_display_direction', ns)
 +
v = dd.find('disp:vertical_display_direction', ns)
 +
display_directions = (h.text.strip(), v.text.strip())
 +
del h, v
  
        # find display_settings_to_array for local_identifier in
+
# determine horizonal and vertical axis array indices
        # Display_Settings
+
h_axis_name = dd.find('./disp:horizontal_display_axis', ns).text.strip()
        display_settings = None
+
for axis in array.findall('./pds4:Axis_Array', ns):
        xpath = ('./pds4:Observation_Area/pds4:Discipline_Area/'
+
    if axis.find('./pds4:axis_name', ns).text.strip() == h_axis_name:
                'disp:Display_Settings')
+
        horizonal_axis = int(axis.find('./pds4:sequence_number', ns).text) - 1
        for e in self.label.findall(xpath, ns):
 
            lir = e.find('./disp:Local_Internal_Reference', ns)
 
            reference = lir.find('./disp:local_identifier_reference', ns).text.strip()
 
            if reference == self.local_identifier:
 
                display_settings = e
 
                break
 
  
        assert display_settings is not None, "Display_Settings for local_identifier == {} not found.".format(self.local_identifier)
+
v_axis_name = dd.find('./disp:vertical_display_axis', ns).text.strip()
 +
for axis in array.findall('./pds4:Axis_Array', ns):
 +
    if axis.find('./pds4:axis_name', ns).text.strip() == v_axis_name:
 +
        vertical_axis = int(axis.find('./pds4:sequence_number', ns).text) - 1
  
        # determine display directions
+
# Move the vertical axis to axis number 0
        display_dir = e.find('./disp:Display_Direction', ns)
+
display_data = np.rollaxis(data, vertical_axis, 0)
        h = display_dir.find('./disp:horizontal_display_direction', ns)
+
if 'Right to Left' in display_directions:
        v = display_dir.find('./disp:vertical_display_direction', ns)
+
    display_data = display_data[:, ::-1]
        self.display_directions = (h.text.strip(), v.text.strip())
+
if 'Top to Bottom' in display_directions:
 +
    display_data = display_data[::-1]
  
        # determine horizonal and vertical axes
+
plt.clf()
        haxis = display_dir.find('./disp:horizontal_display_axis', ns).text.strip()
+
plt.imshow(display_data, origin='lower')
        for axis in array.findall('./pds4:Axis_Array', ns):
+
plt.draw()
            if axis.find('./pds4:axis_name', ns).text.strip() == haxis:
+
plt.show()
                sn = int(axis.find('./pds4:sequence_number', ns).text.strip())
 
                self.horizontal_axis = sn - 1
 
 
 
        vaxis = display_dir.find('./disp:vertical_display_axis', ns).text.strip()
 
        for axis in array.findall('./pds4:Axis_Array', ns):
 
            if axis.find('./pds4:axis_name', ns).text.strip() == vaxis:
 
                sn = int(axis.find('./pds4:sequence_number', ns).text.strip())
 
                self.vertical_axis = sn - 1
 
 
 
    @property
 
    def display_data(self):
 
        # only need to roll one axis for a 2D image
 
        im = np.rollaxis(self.data, self.vertical_axis)
 
        if 'Right to Left' in self.display_directions:
 
            im = im[:, ::-1]
 
        if 'Top to Bottom' in self.display_directions:
 
            im = im[::-1]
 
        return im
 
 
 
def read_image(file_name):
 
    """Read a BIRC image described by a PDS4 label file.
 
 
 
    Only the first Array_2D_Image is returned, based on BIRC PDS4
 
    sample data files.
 
 
 
    Parameters
 
    ----------
 
    file_name : string
 
      The name of the PDS4 label file describing the BIRC image.
 
 
 
    Returns
 
    -------
 
    im : PDS4_Array_2D_Image
 
      The image
 
 
 
    Raises
 
    ------
 
    NotImplementedError
 
 
 
    """
 
    import os
 
 
 
    # namespace definitions
 
    ns = {'pds4': 'http://pds.nasa.gov/pds4/pds/v1'}
 
 
 
    label = ET.parse(file_name)
 
 
 
    # Find the first File_Area_Observational element with an Array_2D_Image
 
    find = label.findall(
 
        'pds4:File_Area_Observational/[pds4:Array_2D_Image]', ns)
 
 
 
    if len(find) > 1:
 
        raise NotImplementedError("Multiple Array_2D_Image elements found.")
 
    else:
 
        file_area = find[0]
 
 
 
    data, local_identifier = read_pds4_array(
 
        file_area, './pds4:Array_2D_Image', ns,
 
        dirname=os.path.dirname(file_name))
 
 
 
    return PDS4_Array_2D_Image(data, local_identifier, label)
 
 
 
def read_pds4_array(file_area, xpath, ns, dirname=''):
 
    """Read a PDS4 data array.
 
 
 
    Parameters
 
    ----------
 
    file_area : ElementTree Element
 
      The File_Area_Observational element from the PDS4 label.
 
    xpath : string
 
      The array is described by the element `file_area.find(xpath)`.
 
    ns : dictionary
 
      Namespace definitions for `file_area.find()`.
 
    dirname : string
 
      The original data label's directory name, used to find the array
 
      file.
 
 
 
    Returns
 
    -------
 
    data : ndarray
 
      The array.
 
    local_identifier : string
 
      The local_identifier of the array, or `None` if not present.
 
 
 
    Raises
 
    ------
 
    NotImplementedError
 
 
 
    """
 
 
 
    import os
 
 
 
    file_name = file_area.find('pds4:File/pds4:file_name', ns).text.strip()
 
    file_name = os.path.join(dirname, file_name)
 
 
 
    array = file_area.find(xpath, ns)
 
 
 
    local_identifier = array.find('./pds4:local_identifier', ns).text.strip()
 
 
 
    # Examine the data type, and translate it into a numpy dtype
 
    pds4_to_numpy_dtypes = {
 
        "IEEE754MSBSingle": '>f4'
 
    }
 
    try:
 
        k = array.find('pds4:Element_Array/pds4:data_type', ns).text.strip()
 
        dtype = np.dtype(pds4_to_numpy_dtypes[k])
 
    except KeyError:
 
        raise NotImplementedError("PDS4 data_type {} not implemented.".format(k))
 
 
 
    # determine the shape
 
    ndim = int(array.find('pds4:axes', ns).text.strip())
 
    shape = ()
 
    for i in range(ndim):
 
        k = ('./pds4:Axis_Array/[pds4:sequence_number="{}"]/pds4:elements').format(i + 1)
 
        shape += (int(array.find(k, ns).text.strip()), )
 
 
 
    # verify axis order
 
    axis_index_order = array.find('./pds4:axis_index_order', ns).text.strip()
 
    assert axis_index_order == 'Last Index Fastest', "Invalid axis order: {}".format(axis_index_order)
 
 
 
    offset = array.find('./pds4:offset', ns)
 
    assert offset.attrib['unit'].lower() == 'byte', "Invalid file offset unit"
 
    with open(file_name, 'r') as inf:
 
        inf.seek(int(offset.text.strip()))
 
        data = np.fromfile(inf, dtype, count=np.prod(shape)).reshape(shape)
 
 
 
    return data, local_identifier
 
 
</pre>
 
</pre>

Latest revision as of 19:25, 18 June 2015

Deprecation Warning

This page has been superseded by Python PDS4 Tools and will be removed at a future date.

Introduction

This document describes an example Python module that can read an image from a PDS4 data product. The code will read the data based on the label keywords, but does not otherwise validate the label. If the user wants to display the image, the code will consider the label's Display_Settings, and provide a copy of the image in the correct orientation for drawing with the origin in the lower left corner. The code below is designed for reading BOPPS BIRC images, but can be used as an example for other limited problems. A more general solution will likely use a different approach.

Contact Mike Kelley with questions or comments regarding this page.

Requirements

This example assumes the user is running Python 2.7, with a recent NumPy package installed. The visualization example uses matplotlib.

Goal and Method

The goal is to read in an image from a BOPPS BIRC data product into a Numpy array, providing the correct orientation for display. We will provide a function with the name of the label, the function will then

  1. Open the label.
  2. Find the data product file name.
  3. Determine the Array_2D_Image data type and shape.
  4. Read in the data array.
  5. Return the array and meta data in a single object.

The object will have two attributes that allow access to the data

  1. the data with the axis order and orientation as provided in the file, and
  2. the data with the axis order and orientation reconfigured according to the label's Display_Settings class, so that it will have the correct orientation if drawn with the origin in the lower left corner.

Implementation Details

For this basic example, we designed the reader as a function in a module named birc_example_reader. The user calls a single function, birc.read_image(), passing the name of the label as the first argument. The function will load the label using the ElementTree module and find the first Array_2D_Image element to read in. A second function, read_pds4_array(), determines the correct data type and shape, then reads the data from the file. A class specifically designed for PDS4 Array_2D_Image objects, aptly named PDS4_Array_2D_Image, is initialized with the data, the label describing the data, and the local_identifier of the array. The local_identifier is not normally required in PDS4 array objects, but it must be present when the image display orientation is provided via Display_Settings. Since these are present in the BIRC labels, our class assumes local_identifier is included. The class then determines the image orientation. The image is stored as a class attribute data. The class attribute display_data is also provided, which can be used for displaying with the origin in the lower left corner.

Download File:Birc example reader.zip.

Minimal Working Example

Rather than list the birc_example_reader.py here, below we provide a minimal working example with the same basic functionality. The example is a flat script with extensive comments, which may more clearly illustrate some of the methods for working with PDS4 image labels.

birc_mwe.py

"""
birc_mwe --- Minimal working example to read and orient BIRC images
===================================================================

Execute this script:
  * On the command line: python birc_mwe.py
  * In IPython: run birc_mwe.py

The data will be in a variable named `data`.  The data for display
(origin in the lower left) will be in a variable named `display_data`.

Little to no error checking or label validation is done for this
example.

"""

# required modules
import os
import xml.etree.ElementTree as ET
import numpy as np
import matplotlib.pyplot as plt

# The PDS4 label file name
label_name = 'output/rawFitsFrame/cerh2_1_010000_rb_n169_n011.xml'

# XML namespace definitions
ns = {'pds4': 'http://pds.nasa.gov/pds4/pds/v1',
      'disp': 'http://pds.nasa.gov/pds4/disp/v1'}

# read in the label
label = ET.parse(label_name)

# Find the first File_Area_Observational element with an
# Array_2D_Image and assume it is what we want (OK assumption for BIRC
# test labels).
file_area = label.find('./pds4:File_Area_Observational/[pds4:Array_2D_Image]',
                       ns)

# Image file name, prefixed with the path to the label
file_name = file_area.find('./pds4:File/pds4:file_name', ns).text.strip()
file_name = os.path.join(os.path.dirname(label_name), file_name)

# Find the array class, the local identifier and data type of the
# array.  Transform the PDS4 data type into a NumPy data type (i.e.,
# dtype).
array = file_area.find('./pds4:Array_2D_Image', ns)
local_identifier = array.find('./pds4:local_identifier', ns).text.strip()
pds4_to_numpy_dtypes = {
    "IEEE754MSBSingle": '>f4'  # All BIRC example data is IEEE754MSBSingle
}
data_type = array.find('pds4:Element_Array/pds4:data_type', ns).text.strip()
dtype = np.dtype(pds4_to_numpy_dtypes[data_type])

# determine the array shape
shape = []
for i in [1, 2]:
    # find axis 
    k = './pds4:Axis_Array/[pds4:sequence_number="{}"]/pds4:elements'.format(i)
    shape.append(int(array.find(k, ns).text))

# read in the data
offset = array.find('./pds4:offset', ns)
with open(file_name, 'r') as inf:
    inf.seek(int(offset.text))
    data = np.fromfile(inf, dtype, count=np.prod(shape)).reshape(shape)

# Rotate the data into display orientation (origin in lower left).

# find display_settings_to_array for local_identifier in
# Display_Settings.  The BIRC sample labels have extra whitespace in
# the Display_Settings local_identifier_reference, so we cannot use a
# search similar to the one we did above for array shape with
# ElementTree's limited xpath support.
display_settings = None
xpath = ('./pds4:Observation_Area/pds4:Discipline_Area/disp:Display_Settings')
for e in label.findall(xpath, ns):
    k = './disp:Local_Internal_Reference/disp:local_identifier_reference'
    reference = e.find(k, ns).text.strip()
    if reference == local_identifier:
        display_settings = e
        break

# determine display directions
dd = display_settings.find('./disp:Display_Direction', ns)
h = dd.find('disp:horizontal_display_direction', ns)
v = dd.find('disp:vertical_display_direction', ns)
display_directions = (h.text.strip(), v.text.strip())
del h, v

# determine horizonal and vertical axis array indices
h_axis_name = dd.find('./disp:horizontal_display_axis', ns).text.strip()
for axis in array.findall('./pds4:Axis_Array', ns):
    if axis.find('./pds4:axis_name', ns).text.strip() == h_axis_name:
        horizonal_axis = int(axis.find('./pds4:sequence_number', ns).text) - 1

v_axis_name = dd.find('./disp:vertical_display_axis', ns).text.strip()
for axis in array.findall('./pds4:Axis_Array', ns):
    if axis.find('./pds4:axis_name', ns).text.strip() == v_axis_name:
        vertical_axis = int(axis.find('./pds4:sequence_number', ns).text) - 1

# Move the vertical axis to axis number 0
display_data = np.rollaxis(data, vertical_axis, 0)
if 'Right to Left' in display_directions:
    display_data = display_data[:, ::-1]
if 'Top to Bottom' in display_directions:
    display_data = display_data[::-1]

plt.clf()
plt.imshow(display_data, origin='lower')
plt.draw()
plt.show()