Filling Out the Document Format Set Classes

From The SBN Wiki
Revision as of 22:01, 18 January 2013 by Raugh (talk | contribs)
Jump to navigation Jump to search

The <Document_Format_Set> class contains a brief description of one of possibly several different physical forms of the same document. For example, a PDF file is one common form for documents. In PDS3, most documents were presented as an ASCII text file and a series of images for the graphics. The combined text and graphics would constitute a single form of the document.

For additional explanation, see the PDS4 Standards Reference, or contact your PDS node consultant.

Following are the attributes and subclasses you'll find in the Document_Format_Set, in label order.

Note that in the PDS4 master schema, all classes have capiltaized names; attributes never do.



This class must occur exactly once. It provides information that applies to this physical format of the document.



This is useful if the document format you're describing contains more than one file. The <local_identifier> corresponding to the file that should be considered the starting point for reading the document should be the value of this attribute (you'll have to define a <local_identifier> for that file, of course).

For example, in the ASCII text plus graphics files, example, the <Document_File> object for the ASCII text file should contain a local_identifier so it can be identified here as the main file for that document format.



The format_type must be set to one of the values single_file or multiple_file, depending on whether there is one <Document_File> class following, or more than one, respectively.



This attribute provides a place for free-format text description of this particular format of the document. For example, if this format resulted from scanning a paper copy, you can use this attribute to mention than and credit/thank the source.

Description of the document content (that is, the logical content, not the physical file structure), should be in the <Document> class.



This class identifies and describes one of the files comprising this particular physical form of the document. There must be one <Document_File> class for each file in the format described by the <Document_Format_Set> containing this class.



The name of the file being described, without any directory path information (which can be included below in directory_path_name, if needed). The name is case-sensitive.



This attribute holds a simple identifier to be used to cross-reference this file description from elsewhere in the label. This is required for the first file (the file a user should examine first) of a document format that contains multiple files. It is optional in all other cases. Case is significant for this as well.



If you would like to document the creation date of the file named in file_name, this is the place to do it. The date and (optional) time must be in the ISO 8601 standard format.



The size of the file, in bytes. It must be in bytes; must not contain any punctuation ("12345", not "12,345"), and should be accurate to the byte.



This is the total number or records in this file. Note that the concept of "record" is not defined. In a flat text file, this is usually taken as the number of lines delimited by carriage control (which can vary). In binary files, this may be something like the count of rows in a binary table, provided the file only contains one data object. In other cases this cannot be defined.



If you prefer to track the MD5 checksum of the file in this label, here's an attribute to hold it. In this context it must the be MD5 checksum of the indicated file as whole - not a part of the file.



This attribute provides a place for free-format text to add any additional explanation or credits relevant to this particular file.



If directory path information is needed to find the named file (that is, if it is not in the same directory as the label describing it) the path relative to the label file goes here.

The file must be in either the same directory as the label or a subdirectory of that directory. Paths should follow the Unix/Linux convention and use '/' as a level separator; case is significant. Do not include the file name with the path information.

Note: The details of the format for this value are not, apparently, defined in the PDS4 documentation. I think the above rules are valid, but I'm not sure.



This required attribute should have a value from the standard value list you can find on the Questions page.

Note: Most of the values on this list are not appropriate - either that are not accurate or are not sufficiently specific, and none of them include version numbers, which are critical in more that a few cases. Until this is corrected, please use the <comment> field to indicate the precise standard and version used in the corrected notation for that standard.