Difference between revisions of "Filling Out the Document Format Set Classes"

From The SBN Wiki
Jump to navigation Jump to search
(Creation - Safety Save)
 
(Update for Release 1.0)
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
The '''''<Document_Format_Set>''''' class combines a brief description of one of possibly several different physical forms of the same document.  For example, a PDF file is one common form for documents.  In PDS3, most documents were presented as an ASCII text file and a series of images for the graphics.  The combined text and graphics would constitute a single form of the document.
+
The '''''<Document_Format_Set>''''' class contains a brief description of one of possibly several different physical forms of the same document.  For example, a PDF file is one common format for documents, and generally consists of a single fileA ''Document_Format_Set'' containing a single ''Document_File'' subclass would be used to label that single file.  Alternatively, in PDS3, most documents were presented as an ASCII text file and a series of separate graphics files (PNG, GIF, JPEG) for the figures (graphics and/or images).  The combined text and graphics would constitute a single form of the document and would be described in PDS4 using a single ''Document_Format_Set'' with multiple ''Document_File'' subclasses.
  
 
For additional explanation, see the PDS4 Standards Reference, or contact your PDS node consultant.
 
For additional explanation, see the PDS4 Standards Reference, or contact your PDS node consultant.
Line 11: Line 11:
 
''REQUIRED''
 
''REQUIRED''
  
This class must occur exactly once.  It provides information that applies to this physical format of the document.
+
This class must occur exactly once.  It provides information that applies to this physical format of the document taken as a whole.
  
 
=== <starting_point_identifier> ===
 
=== <starting_point_identifier> ===
Line 20: Line 20:
 
''&lt;local_identifier&gt;'' for that file, of course).  
 
''&lt;local_identifier&gt;'' for that file, of course).  
  
For example, in the ASCII text plus graphics files, example, the ''&lt;Document_File&gt;'' object for the ASCII text file should contain a ''local_identifier'' so it can be identified here as the main file for that document format.
+
For example, in the ASCII text plus graphics files format, example, the ''&lt;Document_File&gt;'' object for the ASCII text file should contain a ''local_identifier'' so it can be identified here as the main file for that document format.
  
 
=== <format_type> ===
 
=== <format_type> ===
Line 26: Line 26:
 
''REQUIRED''
 
''REQUIRED''
  
The ''format_type'' must be set to one of the values '''single_file''' or '''multiple_file''', depending on whether there is one ''&lt;Document_File&gt;'' class following, or more than one, respectively.
+
The ''format_type'' must be set to one of the values '''single file''' or '''multiple file''', depending on whether there is one ''&lt;Document_File&gt;'' class following, or more than one, respectively.
  
 
=== <description> ===
 
=== <description> ===
Line 76: Line 76:
 
''OPTIONAL''
 
''OPTIONAL''
  
If you prefer to track the MD5 checksum of the file in this label, here's an attribute to hold it.  In this context it must the be MD5 checksum of the indicated file as whole - not a part of the file.
+
If you prefer to track the MD5 checksum of a data file in its PDS label, here's an attribute to hold it.  In this context it must the be MD5 checksum of the indicated file as whole - not a part of the file.
  
 
=== <comment> ===
 
=== <comment> ===
Line 92: Line 92:
 
The file ''must'' be in either the same directory as the label or a subdirectory of that directory.  Paths should follow the Unix/Linux convention and use '/' as a level separator; case is significant.  Do ''not'' include the file name with the path information.  
 
The file ''must'' be in either the same directory as the label or a subdirectory of that directory.  Paths should follow the Unix/Linux convention and use '/' as a level separator; case is significant.  Do ''not'' include the file name with the path information.  
  
 +
{| class="wikitable" style="background-color: yellow"
 +
| '''''Note:''''' ''The details of the format for this value are not defined in the PDS4 ''Data Dictionary'' or ''Standards Reference'' Release 1.0, nor are there any format constraints in the data type definition in the schemas. Although a "ASCII_Directory_Path_Name" type is defined in the XSD schema, it does not constrain the format of the field beyond the requirement that it contain ASCII, and even then that data type is '''not used''' to define the  ''&lt;directory_path_name&gt;'' attribute.''
  
{| class="wikitable" style="background-color: thistle"
+
I think requiring Linux path rules is the intention, but I'm not sure.  Notwithstanding, '''always''' use Unix/Linux-style paths for data coming into the SBN so we have a consistent base to work with.''
| '''''Note:''''' ''The details of the format for this value are not, apparently, defined in the PDS4 documentation.  I think the above rules are valid, but I'm not sure.''
 
 
|}
 
|}
  
=== <external_standard_id> ===
+
=== <document_standard_id> ===
  
 
''REQUIRED''
 
''REQUIRED''
  
This required attribute should have a value from the standard value list you can find on the  
+
This required attribute should have a value from the standard value list you can find on the [[Standard_Values_Quick_Reference#.3Cdocument_standard_id.3E|Standard Values Quick Reference]] page.
[[Questions#.3Cexternal_standard_id.3E_in...|Questions]] page.
 
 
 
 
 
 
 
{| class="wikitable" style="background-color: thistle"
 
| '''''Note:''''' ''Most of the values on this list are not appropriate - either that are not accurate or are not sufficiently specific, and none of them include version numbers, which are critical in more that a few cases.  Until this is corrected, please use the '''&lt;comment&gt;''' field to indicate the precise standard and version used in the corrected notation for that standard.''
 
|}
 

Latest revision as of 15:38, 29 May 2013

The <Document_Format_Set> class contains a brief description of one of possibly several different physical forms of the same document. For example, a PDF file is one common format for documents, and generally consists of a single file. A Document_Format_Set containing a single Document_File subclass would be used to label that single file. Alternatively, in PDS3, most documents were presented as an ASCII text file and a series of separate graphics files (PNG, GIF, JPEG) for the figures (graphics and/or images). The combined text and graphics would constitute a single form of the document and would be described in PDS4 using a single Document_Format_Set with multiple Document_File subclasses.

For additional explanation, see the PDS4 Standards Reference, or contact your PDS node consultant.

Following are the attributes and subclasses you'll find in the Document_Format_Set, in label order.

Note that in the PDS4 master schema, all classes have capiltaized names; attributes never do.

<Document_Format>

REQUIRED

This class must occur exactly once. It provides information that applies to this physical format of the document taken as a whole.

<starting_point_identifier>

OPTIONAL

This is useful if the document format you're describing contains more than one file. The <local_identifier> corresponding to the file that should be considered the starting point for reading the document should be the value of this attribute (you'll have to define a <local_identifier> for that file, of course).

For example, in the ASCII text plus graphics files format, example, the <Document_File> object for the ASCII text file should contain a local_identifier so it can be identified here as the main file for that document format.

<format_type>

REQUIRED

The format_type must be set to one of the values single file or multiple file, depending on whether there is one <Document_File> class following, or more than one, respectively.

<description>

OPTIONAL

This attribute provides a place for free-format text description of this particular format of the document. For example, if this format resulted from scanning a paper copy, you can use this attribute to mention than and credit/thank the source.

Description of the document content (that is, the logical content, not the physical file structure), should be in the <Document> class.

<Document_File>

REQUIRED

This class identifies and describes one of the files comprising this particular physical form of the document. There must be one <Document_File> class for each file in the format described by the <Document_Format_Set> containing this class.

<file_name>

REQUIRED

The name of the file being described, without any directory path information (which can be included below in directory_path_name, if needed). The name is case-sensitive.

<local_identifier>

OPTIONAL

This attribute holds a simple identifier to be used to cross-reference this file description from elsewhere in the label. This is required for the first file (the file a user should examine first) of a document format that contains multiple files. It is optional in all other cases. Case is significant for this as well.

<creation_date_time>

OPTIONAL

If you would like to document the creation date of the file named in file_name, this is the place to do it. The date and (optional) time must be in the ISO 8601 standard format.

<file_size>

OPTIONAL

The size of the file, in bytes. It must be in bytes; must not contain any punctuation ("12345", not "12,345"), and should be accurate to the byte.

<records>

OPTIONAL

This is the total number or records in this file. Note that the concept of "record" is not defined. In a flat text file, this is usually taken as the number of lines delimited by carriage control (which can vary). In binary files, this may be something like the count of rows in a binary table, provided the file only contains one data object. In other cases this cannot be defined.

<md5_checksum>

OPTIONAL

If you prefer to track the MD5 checksum of a data file in its PDS label, here's an attribute to hold it. In this context it must the be MD5 checksum of the indicated file as whole - not a part of the file.

<comment>

OPTIONAL

This attribute provides a place for free-format text to add any additional explanation or credits relevant to this particular file.

<directory_path_name>

OPTIONAL

If directory path information is needed to find the named file (that is, if it is not in the same directory as the label describing it) the path relative to the label file goes here.

The file must be in either the same directory as the label or a subdirectory of that directory. Paths should follow the Unix/Linux convention and use '/' as a level separator; case is significant. Do not include the file name with the path information.

Note: The details of the format for this value are not defined in the PDS4 Data Dictionary or Standards Reference Release 1.0, nor are there any format constraints in the data type definition in the schemas. Although a "ASCII_Directory_Path_Name" type is defined in the XSD schema, it does not constrain the format of the field beyond the requirement that it contain ASCII, and even then that data type is not used to define the <directory_path_name> attribute.

I think requiring Linux path rules is the intention, but I'm not sure. Notwithstanding, always use Unix/Linux-style paths for data coming into the SBN so we have a consistent base to work with.

<document_standard_id>

REQUIRED

This required attribute should have a value from the standard value list you can find on the Standard Values Quick Reference page.