Difference between revisions of "Python PDS4 Tools"

From The SBN Wiki
Jump to navigation Jump to search
m (Modified download hostname to pdssbn.)
m (Updated for v0.8 of pds4_tools)
Line 17: Line 17:
 
Python 2.6+ or 3.3+
 
Python 2.6+ or 3.3+
  
pds4_read: None <br>
+
pds4_read: [http://www.numpy.org/ NumPy] <br>
 
pds4_viewer: [http://www.numpy.org/ NumPy], [http://www.matplotlib.org/ matplotlib]
 
pds4_viewer: [http://www.numpy.org/ NumPy], [http://www.matplotlib.org/ matplotlib]
 
You may use <tt>pds4_read</tt> to read-in data without any extra packages; <tt>pds4_viewer</tt> requires recent versions of the additional packages.
 
 
==== Optional Features ====
 
 
pds4_read: [http://www.numpy.org/ NumPy]<br>
 
Recommended for Arrays and Tables containing GROUP fields to allow for multi-dimensional indexing. Can result in significant improvements in memory usage and read-in speed for some data structures.<br>
 
 
pds4_viewer: None
 
  
 
=== Supported Data Structures ===
 
=== Supported Data Structures ===
Line 45: Line 36:
 
! Display as Image
 
! Display as Image
 
! Display Columns as Plot
 
! Display Columns as Plot
 +
|-
 +
| style="text-align: left;" | Header
 +
| Yes
 +
| No
 +
| No
 +
| No
 
|-
 
|-
 
| style="text-align: left;" | Array  
 
| style="text-align: left;" | Array  
Line 103: Line 100:
 
=== Download ===
 
=== Download ===
  
Download the ZIP file <span class="plainlinks">[http://pdssbn.astro.umd.edu/ftp/tools/readpds_python/0.71/PDS4_tools-0.71.zip File:PDS4 tools-0.71.zip]</span>. Released on February 7, 2017.
+
Download the ZIP file <span class="plainlinks">[http://pdssbn.astro.umd.edu/ftp/tools/readpds_python/0.8/PDS4_tools-0.8.zip File:PDS4 tools-0.8.zip]</span>. Released on June 9, 2017.
 
 
Note: This is software that is still actively being developed.  
 
  
 
Note: A distributable version of the viewer only, which does not require Python, is [[PDS4 Viewer|available]].
 
Note: A distributable version of the viewer only, which does not require Python, is [[PDS4 Viewer|available]].
Line 113: Line 108:
 
==== Option 1 ====
 
==== Option 1 ====
  
Use "<tt>pip install PDS4_tools-0.71.zip</tt>" or "<tt>easy_install PDS4_tools-0.71.zip</tt>", both tools are typically included directly in Python distributions. You can also extract the ZIP file and use "<tt>python /path/to/extraction_directory/setup.py install</tt>". Note that there is no uninstall script provided (although "<tt>pip uninstall pds4_tools</tt>" should work), and that this tool will be updated in the future.  
+
Use "<tt>pip install PDS4_tools-0.8.zip</tt>" or "<tt>easy_install PDS4_tools-0.8.zip</tt>", both tools are typically included directly in Python distributions. You can also extract the ZIP file and use "<tt>python /path/to/extraction_directory/setup.py install</tt>". Note that there is no uninstall script provided (although "<tt>pip uninstall pds4_tools</tt>" should work), and that this tool will be updated in the future.  
  
 
==== Option 2 ====
 
==== Option 2 ====
Line 138: Line 133:
  
 
<pre>
 
<pre>
    Reads PDS4 compliant data into a `StructureList`.
+
        Reads PDS4 compliant data into a `StructureList`.
  
    Given a PDS4 label, reads the PDS4 data described in the label and
+
        Given a PDS4 label, reads the PDS4 data described in the label and
    associated label meta data into a `StructureList`, with each PDS4 data
+
        associated label meta data into a `StructureList`, with each PDS4 data
    structure (e.g. Array_2D, Table_Binary, etc) as its own `Structure`. By
+
        structure (e.g. Array_2D, Table_Binary, etc) as its own `Structure`. By
    default all data structures described in the label are immediately
+
        default all data structures described in the label are immediately
    read into memory.
+
        read into memory.
  
    Notes
+
        Notes
    -----
+
        -----
    Header and Stream Text data structures are currently unsupported
+
        Python 2 v. Python 3: Non-data strings (label, meta data, etc)  in
    and will be skipped on read-in.
+
        Python 2 will be decoded to ``unicode`` and in Python 3 they will
 +
        be decoded to ``str``. The return type of all data strings is
 +
        controlled by *decode_strings*.
  
    Python 2 v. Python 3: All data strings are returned as byte strings. In
+
        Parameters
    Python 2 non-data strings (e.g. meta data, labels, etc) are byte
+
        ----------
    strings and in Python 3 they are unicode strings.
+
        filename : str or unicode
 +
            The filename, including full or relative path if necessary, of
 +
            the PDS4 label describing the data.
 +
        quiet : bool, optional
 +
            Suppresses all info/warnings from being output.
 +
        lazy_load : bool, optional
 +
            If True, then the data of each PDS4 data structure will not be
 +
            read-in to memory until the first attempt to access it. Defaults
 +
            to False.
 +
        no_scale : bool, optional
 +
            If True, returned data will be exactly as written in the data file,
 +
            ignoring offset or scaling values. Defaults to False.
 +
        decode_strings : bool, optional
 +
            If True, strings data types contained in the returned data will be
 +
            decoded to the a unicode in Python 2, and to the str type in
 +
            Python 3. If False, leaves string types as byte strings.
 +
            Defaults to True.
  
    Parameters
+
        Returns
    ----------
+
        -------
    filename : str
+
        StructureList
        The filename, including full or relative path if necessary, of
+
            Contains PDS4 data `Structure`'s, each of which contains the data,
        the PDS4 label describing the data.
+
            the meta data and the label portion describing that data structure.
    quiet : bool, optional
+
            `StructureList` can be treated/accessed/used like a ``dict`` or
        Suppresses all info/warnings from being output.
+
            ``list``.
    lazy_load : bool, optional
 
        If True, then the data of each PDS4 data structure will not be
 
        read-in to memory until the first attempt to access it. Defaults
 
        to False.
 
    no_scale: bool, optional
 
        Returned data will be directly as written in the data file,
 
        ignoring offset or scaling values. Defaults to False.
 
    use_numpy : bool, optional
 
        Returned data will be an `np.ndarray` and use NumPy data types.
 
        Defaults to True if NumPy is installed.
 
  
    Returns
+
        Examples
    -------
+
        --------
    StructureList
 
        Contains PDS4 data `Structure`s, each of which contains the data,
 
        the meta data and the label portion describing that data structure.
 
        `StructureList` can be treated/accessed/used like a ``dict`` or
 
        ``list``.
 
  
    Examples
+
        Below we document how to read data described by an example label
    --------
+
        which has two data structures, an Array_2D_Image and a Table_Binary.
 +
        An outline of the label, including the array and a table with 3
 +
        fields, is given.
  
    Below we document how to read data described by an example label
+
        >>> struct_list = pds4_read('/path/to/Example_Label.xml')
    which has two data structures, an Array_2D_Image and a Table_Binary.
 
    An outline of the label, including the array and a table with 3
 
    fields, is given.
 
  
    >>> struct_list = pds4_read('/path/to/Example_Label.xml')
+
        Example Label Outline::
  
    Example Label Outline:
+
          Array_2D_Image: unnamed
 +
          Table_Binary: Observations
 +
              Field: order
 +
              Field: wavelength
 +
              Group: unnamed
 +
                  Field: pos_vector
  
         Array_2D_Image: unnamed
+
         All below documentation assumes that the above outlined label,
         Table_Binary: Observations
+
         containing an array that does not have a name indicated in the label,
            Field: order
+
        and a table that has the name 'Observations' with 3 fields as shown,
            Field: wavelength
+
        has been read-in.
            Group: unnamed
 
                Field: pos_vector
 
  
    All below documentation assumes that the above outlined label,
+
        Accessing Example Structures:
    containing an array that does not have a name indicated in the label,
 
    and a table that has the name 'Observations' with 3 fields as shown,
 
    has been read-in.
 
  
    Accessing Example Structures:
+
            To access the data structures in `StructureList`, which is returned
 +
            by `pds4_read()`, you may use any combination of ``dict``-like or
 +
            ``list``-like access.
  
        To access the data structures in `StructureList`, which is returned
+
            >>> unnamed_array = struct_list[0]
        by pds4_read(), you may use any combination of `dict` or `list`.
+
            >>>              or struct_list['ARRAY_0']
  
        >>> unnamed_array = struct_list[0]
+
            >>> obs_table = struct_list[1]
        >>>             or struct_list['ARRAY_0']
+
            >>>         or struct_list['Observations']
  
         >>> obs_table = struct_list[1]
+
         Label or Structure Overview:
        >>>          or struct_list['Observations']
 
  
    Label or Structure Overview:
+
            To see a summary of the data structures, which for Arrays shows the
 +
            type and dimensions of the array, and for Tables shows the type
 +
            and number of fields, you may use the `StructureList.info()` method.
 +
            Calling `Structure.info()` on a specific ``Structure`` instead will
 +
            provide a more detailed summary, including all Fields for a table.
  
        To see a summary of the data structures, which for Arrays shows the
+
            >>> struct_list.info()
        type and dimensions of the array, and for Tables shows the type
+
            >>> unnamed_array.info()
        and number of fields, you may use the info() method. Calling
+
            >>> obs_table.info()
        info() on a specific `Structure` instead of `StructureList` will
 
        provide a more detailed summary, including all Fields for a table.
 
  
         >>> struct_list.info()
+
         Accessing Example Label data:
        >>> unnamed_array.info()
 
        >>> obs_table.info()
 
  
    Accessing Example Label data:
+
            To access the read-in data, as an array-like (subclass of ``ndarray``),
 +
            you can use the data attribute for a PDS4 Array data structure, or
 +
            list-like and the field() method to access a field for a table.
  
        To access the read-in data, as an array-like (either list,
+
            PDS4 Arrays
        array.array or ndarray), you can use the data attribute for a
+
            >>> unnamed_array.data
        PDS4 Array data structure, or the field() method to access a field
 
        for a table.
 
  
        >>> unnamed_array.data
+
            PDS4 Table fields
        >>> obs_table.field('wavelength')
+
            >>> obs_table['wavelength']
        >>> obs_table.field('pos_vector')
+
            >>> obs_table.field('wavelength')
  
    Accessing Example Label meta data:
+
            PDS4 Table records
 +
            >>> obs_table[0:1000]
  
         You can access all meta data in the label for a given PDS4 data
+
         Accessing Example Label meta data:
        structure or field via the `OrderedDict` meta_data attribute. The
 
        below examples use the 'description' element.
 
  
        >>> unnamed_array.meta_data['description']
+
            You can access all meta data in the label for a given PDS4 data
 +
            structure or field via the ``OrderedDict`` meta_data attribute. The
 +
            below examples use the 'description' element.
  
        >>> obs_table.field('wavelength').meta_data['description']
+
            >>> unnamed_array.meta_data['description']
        >>> obs_table.field('pos_vector').meta_data['description']
 
  
    Accessing Example Label:
+
            >>> obs_table.field('wavelength').meta_data['description']
 +
            >>> obs_table.field('pos_vector').meta_data['description']
  
         The XML for a label is also accessible via the label attribute,
+
         Accessing Example Label:
        either the entire label or for each PDS4 data structure.
 
  
        Entire label:
+
            The XML for a label is also accessible via the label attribute,
             >>> struct_list.label
+
             either the entire label or for each PDS4 data structure.
  
        Part of label describing Observations table:
+
            Entire label:
            >>> struct_list['Observations'].label
+
                >>> struct_list.label
            >>> struct_list[1].label
 
  
        The returned object is similar to an ElementTree instance. It is
+
            Part of label describing Observations table:
        searchable via find() and findall() methods and XPATH. Consult
+
                >>> struct_list['Observations'].label
        ElementTree manual for more details. For example,
+
                >>> struct_list[1].label
  
        >>> struct_list.label.findall('.//disp:Display_Settings')
+
            The returned object is similar to an ElementTree instance. It is
 +
            searchable via `Label.find()` and `Label.findall()` methods and XPATH.
 +
            Consult ``ElementTree`` manual for more details. For example,
  
        Will find all elements in the entire label named 'Display_Settings'
+
            >>> struct_list.label.findall('.//disp:Display_Settings')
        which are in the 'disp' namespace. You can additionally use the
+
 
        to_dict() and to_string() methods.
+
            Will find all elements in the entire label named 'Display_Settings'
 +
            which are in the 'disp' prefix's namespace. You can additionally use the
 +
            `Label.to_dict()` and `Label.to_string()` methods.
 
</pre>
 
</pre>
  
Line 294: Line 295:
  
 
field_data = table.field('field_name') # or
 
field_data = table.field('field_name') # or
field_data = table.fields[0]
+
field_data = table.fields[0]
 +
 
 +
record_data = table[0:50]
  
 
# Array data access
 
# Array data access

Revision as of 02:24, 10 June 2017

Python and PDS4

This document describes the current status and usage of Python tools developed at PDS-SBN to read and visualize PDS4 data in Python. Please note that a PDS4 reader and visualizer for IDL is also available.

Reading and Displaying PDS4 Data

Introduction

This section describes a Python package that can read and display PDS4 data and meta data. In the future this tool is expected to support all PDS4 data structures, currently support is limited to structures given in the Supported Data Structures section. The package expects valid PDS4 labels formatted according to the PDS4 Standard.

Contact Lev Nagdimunov with questions or comments regarding this code or its description.

Requirements

Python 2.6+ or 3.3+

pds4_read: NumPy
pds4_viewer: NumPy, matplotlib

Supported Data Structures

PDS4 Data Standards >= v1.0 are supported.
PDS3 Data Standards are not supported.

The table below lists the main PDS4 data structures and the current status.

Read-in column indicates support by pds4_read()
Display columns indicate support by pds4_viewer().

Structure Read-in Display as Table Display as Image Display Columns as Plot
Header Yes No No No
Array Yes Yes Yes, N-dims Yes, 1-D only
Array_2D Yes Yes Yes No
Array_2D_* Yes Yes Yes No
Array_3D Yes Yes Yes No
Array_3D_* Yes Yes Yes No
Table_Character Yes Yes No Yes
Table_Binary Yes, except BitFields Yes No Yes
Table_Delimited Yes Yes No Yes

User Manual

An HTML User Manual is available online.

An API Quick Start, for developers, is also available.

Download

Download the ZIP file File:PDS4 tools-0.8.zip. Released on June 9, 2017.

Note: A distributable version of the viewer only, which does not require Python, is available.

Installation

Option 1

Use "pip install PDS4_tools-0.8.zip" or "easy_install PDS4_tools-0.8.zip", both tools are typically included directly in Python distributions. You can also extract the ZIP file and use "python /path/to/extraction_directory/setup.py install". Note that there is no uninstall script provided (although "pip uninstall pds4_tools" should work), and that this tool will be updated in the future.

Option 2

Extract the downloaded file to a directory Python can find. To use it follow the instructions in Example Usage except with the following lines first,

import sys
sys.path.extend(['/path/to/extraction_directory/'])

from pds4_tools import pds4_read, pds4_viewer

# The extraction_directory is the one that includes setup.py
# On a windows machine use backslashes (/) instead of windows' normal forward slashes to specify paths

Example Usage

See also the User Manual.

pds4_read

Import via "from pds4_tools import pds4_read". You may then call pds4_read from your own code. The following is the docstring for pds4_read:

        Reads PDS4 compliant data into a `StructureList`.

        Given a PDS4 label, reads the PDS4 data described in the label and
        associated label meta data into a `StructureList`, with each PDS4 data
        structure (e.g. Array_2D, Table_Binary, etc) as its own `Structure`. By
        default all data structures described in the label are immediately
        read into memory.

        Notes
        -----
        Python 2 v. Python 3: Non-data strings (label, meta data, etc)  in
        Python 2 will be decoded to ``unicode`` and in Python 3 they will
        be decoded to ``str``. The return type of all data strings is
        controlled by *decode_strings*.

        Parameters
        ----------
        filename : str or unicode
            The filename, including full or relative path if necessary, of
            the PDS4 label describing the data.
        quiet : bool, optional
            Suppresses all info/warnings from being output.
        lazy_load : bool, optional
            If True, then the data of each PDS4 data structure will not be
            read-in to memory until the first attempt to access it. Defaults
            to False.
        no_scale : bool, optional
            If True, returned data will be exactly as written in the data file,
            ignoring offset or scaling values. Defaults to False.
        decode_strings : bool, optional
            If True, strings data types contained in the returned data will be
            decoded to the a unicode in Python 2, and to the str type in
            Python 3. If False, leaves string types as byte strings.
            Defaults to True.

        Returns
        -------
        StructureList
            Contains PDS4 data `Structure`'s, each of which contains the data,
            the meta data and the label portion describing that data structure.
            `StructureList` can be treated/accessed/used like a ``dict`` or
            ``list``.

        Examples
        --------

        Below we document how to read data described by an example label
        which has two data structures, an Array_2D_Image and a Table_Binary.
        An outline of the label, including the array and a table with 3
        fields, is given.

        >>> struct_list = pds4_read('/path/to/Example_Label.xml')

        Example Label Outline::

           Array_2D_Image: unnamed
           Table_Binary: Observations
               Field: order
               Field: wavelength
               Group: unnamed
                   Field: pos_vector

        All below documentation assumes that the above outlined label,
        containing an array that does not have a name indicated in the label,
        and a table that has the name 'Observations' with 3 fields as shown,
        has been read-in.

        Accessing Example Structures:

            To access the data structures in `StructureList`, which is returned
            by `pds4_read()`, you may use any combination of ``dict``-like or
            ``list``-like access.

            >>> unnamed_array = struct_list[0]
            >>>              or struct_list['ARRAY_0']

            >>> obs_table = struct_list[1]
            >>>          or struct_list['Observations']

        Label or Structure Overview:

            To see a summary of the data structures, which for Arrays shows the
            type and dimensions of the array, and for Tables shows the type
            and number of fields, you may use the `StructureList.info()` method.
            Calling `Structure.info()` on a specific ``Structure`` instead will
            provide a more detailed summary, including all Fields for a table.

            >>> struct_list.info()
            >>> unnamed_array.info()
            >>> obs_table.info()

        Accessing Example Label data:

            To access the read-in data, as an array-like (subclass of ``ndarray``),
            you can use the data attribute for a PDS4 Array data structure, or
            list-like and the field() method to access a field for a table.

            PDS4 Arrays
            >>> unnamed_array.data

            PDS4 Table fields
            >>> obs_table['wavelength']
            >>> obs_table.field('wavelength')

            PDS4 Table records
            >>> obs_table[0:1000]

        Accessing Example Label meta data:

            You can access all meta data in the label for a given PDS4 data
            structure or field via the ``OrderedDict`` meta_data attribute. The
            below examples use the 'description' element.

            >>> unnamed_array.meta_data['description']

            >>> obs_table.field('wavelength').meta_data['description']
            >>> obs_table.field('pos_vector').meta_data['description']

        Accessing Example Label:

            The XML for a label is also accessible via the label attribute,
            either the entire label or for each PDS4 data structure.

            Entire label:
                >>> struct_list.label

            Part of label describing Observations table:
                >>> struct_list['Observations'].label
                >>> struct_list[1].label

            The returned object is similar to an ElementTree instance. It is
            searchable via `Label.find()` and `Label.findall()` methods and XPATH.
            Consult ``ElementTree`` manual for more details. For example,

            >>> struct_list.label.findall('.//disp:Display_Settings')

            Will find all elements in the entire label named 'Display_Settings'
            which are in the 'disp' prefix's namespace. You can additionally use the
            `Label.to_dict()` and `Label.to_string()` methods.

Usage is described above. A basic usage example is as follows:

""" Basic pds4_read example """

from pds4_tools import pds4_read

structures = pds4_read('/path/to/label.xml')

structures.info()

0 - Array_3D_Spectrum 'table_name' (3 axes, 21 x 10 x 36)
1 - Table_Binary 'array_name' (5 fields x 1000 records)

# Table data access
table = structures['table_name'] # or
table = structures[0]

table.info()

field_data = table.field('field_name') # or
field_data = table.fields[0] 

record_data = table[0:50]

# Array data access
array = structures['array_name'] # or
array = structures[1]

array_data = array.data

# Meta-data access
field_meta = table.field('field_name').meta_data # or
field_meta = table.fields[0].meta_data
array_meta = array.meta_data

print field_meta['description']
print field_meta['unit']
print array_meta['local_identifier']

# Label access
label = structures.label # Full label
label = table.label      # Label section describing the table object

display_settings = label.findall('.//disp:Display_Settings')

display_dict = display_settings.to_dict()
label_dict = label.to_dict()
label_string = label.to_string()

pds4_viewer

Import via "from pds4_tools import pds4_viewer". To display the data structures (such as images, spectra, or tables) in a label you may then call pds4_viewer from the Python interpreter, with or without any arguments:

    Displays PDS4 compliant data in a GUI.

    Given a PDS4 label, displays PDS4 data described in the label and
    associated label meta data in a GUI. By default all data structures described
    in the label are read-in and displayed. Can be called without any
    parameters, opening a GUI that has a File->Open function to select
    desired label to be read-in and displayed.

    Parameters:

        filename : str, optional
            The filename, including full or relative path if necessary, of
            the PDS4 label describing the data to be viewed.
        from_existing_structures : StructureList, optional
            An existing StructureList, as returned by pds4_read(), to view. Takes
            precedence if given together with filename.
        lazy_load : bool, optional
            Do not read-in data of each data structure until attempt to view said
            data structure. Defaults to True.
        quiet : bool, optional
            If True, suppresses all info/warnings from being output and displayed.
            Defaults to False.

It is not necessary to include the filename parameter for pds4_viewer, you may simplify call it without any options or arguments and a GUI will open from which you can open labels.

You may also call pds4_viewer from another module or script. All the above arguments are available as optional named parameters. A basic example usage is as follows:

""" Basic pds4_viewer example """

from pds4_tools import pds4_read, pds4_viewer

pds4_viewer()

# or

pds4_viewer('/path/to/label.xml')

# or 

struct_list = pds4_read('label.xml')
pds4_viewer(from_existing_structures=struct_list) # Won't re-read the data