Difference between revisions of "Example Python Reader for PDS4 Images"
m (→birc.py) |
m (Clean up text, add contact details.) |
||
Line 2: | Line 2: | ||
== Introduction == | == Introduction == | ||
− | This document describes | + | This document describes an example Python module that can read an image from a PDS4 data product. The code will read the data based on the label keywords, but does not otherwise validate the label. If the user wants to display the image, the code will consider the label's <tt>Display_Settings</tt>, and provide a copy of the image in the correct orientation for drawing with the origin in the lower left corner. The code below is designed for reading BOPPS BIRC images, but can be used as an example for other limited problems. A more general solution will likely use a different approach. |
+ | |||
+ | Contact [[User:Kelley|Mike Kelley]] with questions or comments regarding this page. | ||
== Requirements == | == Requirements == | ||
Line 10: | Line 12: | ||
== Goal and Method == | == Goal and Method == | ||
− | The goal is to read in an image from a BOPPS BIRC data product into a Numpy array. We will provide | + | The goal is to read in an image from a BOPPS BIRC data product into a Numpy array, providing the correct orientation for display. We will provide a function with the name of the label, the function will then |
# Open the label. | # Open the label. | ||
# Find the data product file name. | # Find the data product file name. | ||
# Determine the Array_2D_Image data type and shape. | # Determine the Array_2D_Image data type and shape. | ||
# Read in the data array. | # Read in the data array. | ||
− | # Return the array. | + | # Return the array and meta data in a single object. |
+ | |||
+ | The object will have two attributes that allow access to the data | ||
+ | # the data with the axis order and orientation as provided in the file, and | ||
+ | # the data with the axis order and orientation reconfigured according to the label's <tt>Display_Settings</tt> class, so that it will have the correct orientation if drawn with the origin in the lower left corner. | ||
− | == Implementation == | + | == Implementation Details == |
− | For this basic example, we designed the reader as a function in a module named <tt> | + | For this basic example, we designed the reader as a function in a module named <tt>birc_example_reader</tt>. The user calls a single function, <tt>birc.read_image()</tt>, passing the name of the label as the first argument. The function will load the label using the [https://docs.python.org/2.7/library/xml.etree.elementtree.html ElementTree] module and find the first <tt>Array_2D_Image</tt> element to read in. A second function, <tt>read_pds4_array()</tt>, determines the correct data type and shape, then reads the data from the file. A class specifically designed for PDS4 <tt>Array_2D_Image</tt> objects, aptly named <tt>PDS4_Array_2D_Image</tt>, is initialized with the data, the label describing the data, and the <tt>local_identifier</tt> of the array. The <tt>local_identifier</tt> is not normally required in PDS4 array objects, but it must be present when the image display orientation is provided via <tt>Display_Settings</tt>. Since these are present in the BIRC labels, our class assumes <tt>local_identifier</tt> is included. The class then determines the image orientation. The image is stored as a class attribute <tt>data</tt>. The class attribute <tt>display_data</tt> is also provided, which can be used for displaying with the origin in the lower left corner. The code is listed below. |
== Example == | == Example == | ||
− | Using the <tt> | + | Using the <tt>birc_example_reader.py</tt> file listed below, this example reads in an image, zeros out the first 10 rows of the array, then displays the data with <tt>matplotlib</tt>. |
<pre> | <pre> | ||
Line 31: | Line 37: | ||
# the image is im.data | # the image is im.data | ||
# the image for displaying purposes is im.display_data | # the image for displaying purposes is im.display_data | ||
− | im = birc.read_image('cerh2_1_010000_rb_n169_n011.xml') | + | im = birc.read_image('output/rawFitsFrame/cerh2_1_010000_rb_n169_n011.xml') |
# zero out the first 10 rows | # zero out the first 10 rows |
Revision as of 18:49, 20 May 2015
Contents
Introduction
This document describes an example Python module that can read an image from a PDS4 data product. The code will read the data based on the label keywords, but does not otherwise validate the label. If the user wants to display the image, the code will consider the label's Display_Settings, and provide a copy of the image in the correct orientation for drawing with the origin in the lower left corner. The code below is designed for reading BOPPS BIRC images, but can be used as an example for other limited problems. A more general solution will likely use a different approach.
Contact Mike Kelley with questions or comments regarding this page.
Requirements
This example assumes the user is running Python 2.7, with a recent NumPy package installed. The visualization example uses matplotlib.
Goal and Method
The goal is to read in an image from a BOPPS BIRC data product into a Numpy array, providing the correct orientation for display. We will provide a function with the name of the label, the function will then
- Open the label.
- Find the data product file name.
- Determine the Array_2D_Image data type and shape.
- Read in the data array.
- Return the array and meta data in a single object.
The object will have two attributes that allow access to the data
- the data with the axis order and orientation as provided in the file, and
- the data with the axis order and orientation reconfigured according to the label's Display_Settings class, so that it will have the correct orientation if drawn with the origin in the lower left corner.
Implementation Details
For this basic example, we designed the reader as a function in a module named birc_example_reader. The user calls a single function, birc.read_image(), passing the name of the label as the first argument. The function will load the label using the ElementTree module and find the first Array_2D_Image element to read in. A second function, read_pds4_array(), determines the correct data type and shape, then reads the data from the file. A class specifically designed for PDS4 Array_2D_Image objects, aptly named PDS4_Array_2D_Image, is initialized with the data, the label describing the data, and the local_identifier of the array. The local_identifier is not normally required in PDS4 array objects, but it must be present when the image display orientation is provided via Display_Settings. Since these are present in the BIRC labels, our class assumes local_identifier is included. The class then determines the image orientation. The image is stored as a class attribute data. The class attribute display_data is also provided, which can be used for displaying with the origin in the lower left corner. The code is listed below.
Example
Using the birc_example_reader.py file listed below, this example reads in an image, zeros out the first 10 rows of the array, then displays the data with matplotlib.
import birc_example_reader as birc import matplotlib.pyplot as plt # the image is im.data # the image for displaying purposes is im.display_data im = birc.read_image('output/rawFitsFrame/cerh2_1_010000_rb_n169_n011.xml') # zero out the first 10 rows im[:10] = 0 # display the result plt.clf() plt.imshow(im.display_data, origin='lower') plt.draw()
birc_example_reader.py
""" birc_example_reader --- Example PDS4 Array_2D_Image reader for BOPPS/BIRC data ============================================================================== Example ------- import birc_example_reader as birc import matplotlib.pyplot as plt # array is im.data # array for displaying is im.display_data im = birc.read_image('cerh2_1_010000_rb_n169_n011.xml') plt.clf() plt.imshow(im.display_data, origin='lower') plt.draw() Classes ------- PDS4_Array_2D_Image - A PDS4 2D image. Functions --------- read_image - Read a BIRC image described by a PDS4 label file. read_pds4_array - Read a PDS4 data array. """ import numpy as np import xml.etree.ElementTree as ET class PDS4_Array_2D_Image(object): """A PDS4 array for 2D images, with limited functionality. Parameters ---------- data : ndarray The data. local_identifier : string The label's local_identifier for this array. label : ElementTree The PDS4 label that contains the description of the array. Attributes ---------- data : ndarray See Parameters. label : ElementTree See Parameters. local_identifier : string See Parameters. horizontal_axis : int The index of the horizontal axis for display. vertical_axis : int The index of the vertical axis for display. display_data : ndarray The data array rotated into display orientation, assuming the display will draw the image with the origin in the lower left corner. The vertical axis will be axis 0, the horizontal axis will be axis 1. """ def __init__(self, data, local_identifier, label): self.data = data self.local_identifier = local_identifier self.label = label self._orient() def _orient(self): """Set object image orientation attributes.""" # namespace definitions ns = {'pds4': 'http://pds.nasa.gov/pds4/pds/v1', 'disp': 'http://pds.nasa.gov/pds4/disp/v1'} # find local_identifier in File_Area_Observational array = None xpath = ('./pds4:File_Area_Observational/pds4:Array_2D_Image/' '[pds4:local_identifier]') for e in self.label.findall(xpath, ns): this_local_id = e.find('./pds4:local_identifier', ns).text.strip() if this_local_id == self.local_identifier: array = e break assert array is not None, "Array_2D_Image with local_identifier == {} not found.".format(self.local_identifier) # find display_settings_to_array for local_identifier in # Display_Settings display_settings = None xpath = ('./pds4:Observation_Area/pds4:Discipline_Area/' 'disp:Display_Settings') for e in self.label.findall(xpath, ns): lir = e.find('./disp:Local_Internal_Reference', ns) reference = lir.find('./disp:local_identifier_reference', ns).text.strip() if reference == self.local_identifier: display_settings = e break assert display_settings is not None, "Display_Settings for local_identifier == {} not found.".format(self.local_identifier) # determine display directions display_dir = e.find('./disp:Display_Direction', ns) h = display_dir.find('./disp:horizontal_display_direction', ns) v = display_dir.find('./disp:vertical_display_direction', ns) self.display_directions = (h.text.strip(), v.text.strip()) # determine horizonal and vertical axes haxis = display_dir.find('./disp:horizontal_display_axis', ns).text.strip() for axis in array.findall('./pds4:Axis_Array', ns): if axis.find('./pds4:axis_name', ns).text.strip() == haxis: sn = int(axis.find('./pds4:sequence_number', ns).text.strip()) self.horizontal_axis = sn - 1 vaxis = display_dir.find('./disp:vertical_display_axis', ns).text.strip() for axis in array.findall('./pds4:Axis_Array', ns): if axis.find('./pds4:axis_name', ns).text.strip() == vaxis: sn = int(axis.find('./pds4:sequence_number', ns).text.strip()) self.vertical_axis = sn - 1 @property def display_data(self): # only need to roll one axis for a 2D image im = np.rollaxis(self.data, self.vertical_axis) if 'Right to Left' in self.display_directions: im = im[:, ::-1] if 'Top to Bottom' in self.display_directions: im = im[::-1] return im def read_image(file_name): """Read a BIRC image described by a PDS4 label file. Only the first Array_2D_Image is returned, based on BIRC PDS4 sample data files. Parameters ---------- file_name : string The name of the PDS4 label file describing the BIRC image. Returns ------- im : PDS4_Array_2D_Image The image Raises ------ NotImplementedError """ import os # namespace definitions ns = {'pds4': 'http://pds.nasa.gov/pds4/pds/v1'} label = ET.parse(file_name) # Find the first File_Area_Observational element with an Array_2D_Image find = label.findall( 'pds4:File_Area_Observational/[pds4:Array_2D_Image]', ns) if len(find) > 1: raise NotImplementedError("Multiple Array_2D_Image elements found.") else: file_area = find[0] data, local_identifier = read_pds4_array( file_area, './pds4:Array_2D_Image', ns, dirname=os.path.dirname(file_name)) return PDS4_Array_2D_Image(data, local_identifier, label) def read_pds4_array(file_area, xpath, ns, dirname=''): """Read a PDS4 data array. Parameters ---------- file_area : ElementTree Element The File_Area_Observational element from the PDS4 label. xpath : string The array is described by the element `file_area.find(xpath)`. ns : dictionary Namespace definitions for `file_area.find()`. dirname : string The original data label's directory name, used to find the array file. Returns ------- data : ndarray The array. local_identifier : string The local_identifier of the array, or `None` if not present. Raises ------ NotImplementedError """ import os file_name = file_area.find('pds4:File/pds4:file_name', ns).text.strip() file_name = os.path.join(dirname, file_name) array = file_area.find(xpath, ns) local_identifier = array.find('./pds4:local_identifier', ns).text.strip() # Examine the data type, and translate it into a numpy dtype pds4_to_numpy_dtypes = { "IEEE754MSBSingle": '>f4' } try: k = array.find('pds4:Element_Array/pds4:data_type', ns).text.strip() dtype = np.dtype(pds4_to_numpy_dtypes[k]) except KeyError: raise NotImplementedError("PDS4 data_type {} not implemented.".format(k)) # determine the shape ndim = int(array.find('pds4:axes', ns).text.strip()) shape = () for i in range(ndim): k = ('./pds4:Axis_Array/[pds4:sequence_number="{}"]/pds4:elements').format(i + 1) shape += (int(array.find(k, ns).text.strip()), ) # verify axis order axis_index_order = array.find('./pds4:axis_index_order', ns).text.strip() assert axis_index_order == 'Last Index Fastest', "Invalid axis order: {}".format(axis_index_order) offset = array.find('./pds4:offset', ns) assert offset.attrib['unit'].lower() == 'byte', "Invalid file offset unit" with open(file_name, 'r') as inf: inf.seek(int(offset.text.strip())) data = np.fromfile(inf, dtype, count=np.prod(shape)).reshape(shape) return data, local_identifier