Difference between revisions of "Filling Out the File Area Text Class"

From The SBN Wiki
Jump to navigation Jump to search
(Update for Release 1.0)
(→‎<parsing_standard_id>: Update for IM 1.14.0.0)
 
(3 intermediate revisions by the same user not shown)
Line 49: Line 49:
 
''REQUIRED''
 
''REQUIRED''
  
The value should identify the standard that tells you how to read the information in the file.   
+
The value identifies the text standard used in the file.  It must be one of these:
 +
 
 +
* '''7-Bit ASCII Text''' if the file contains ''only'' ASCII characters
 +
* '''PDS3''' if the file contains a PDS3 label
 +
* '''UTF-8 Text''' if the file contains UTF-8 characters
  
 
{| class="wikitable" style="background-color: yellow"
 
{| class="wikitable" style="background-color: yellow"
| '''Note:''' There should be a standard value list for this, but there isn't.
+
| '''Note:''' ''Although the value '''''PDS3''''' is valid here, you should almost certainly not be referencing PDS3 labels to document your PDS4 Bundle.''
Until there is, use one of these or ask for advice:
+
|}
* If you have a simple ASCII text file, use "7-Bit ASCII Text".
+
 
* If you have a simple UTF-8 text file, use "UTF-8 Text".
+
7-bit ASCII is a strict subset of UTF-8.  In either case, avoid non-printing characters other than the line delimiters and the blank characters, especially if your text file depends on a fixed-space font to display correctly.
* If you have an HTML file but don't know what standard it's coded to, use "HTML 4.0".
+
 
* If you have an HTML file and you do know what standard it's coded to, use the closest thing from this list:
+
{| class="wikitable" style="background-color: lightcyan"
** HTML 2.0
+
| '''Note:''' You should be careful to ensure that your file actually contains only ASCII or UTF-8 characters.  This can be a little tricky, especially on certain operating systems still using their own proprietary code pages as defaults for their very popular office software. If you don't know for certain that your editor is operating in UTF-8 mode, please check your settings and read the documentation ''before'' submitting a file with potentially spurious characters in it.  This is ''really'' annoying to chase down and correct after the fact.
** HTML 3.2
 
** HTML 4.0
 
** HTML 4.1
 
 
|}
 
|}
  
Line 74: Line 75:
 
''REQUIRED''
 
''REQUIRED''
  
This must have the value '''carriage-return line-feed'''.  The corresponding text must also have carriage-return/linefeed delimited lines.
+
This must have the value '''Carriage-Return Line-Feed'''.  The corresponding text must also have carriage-return/linefeed delimited lines.

Latest revision as of 21:16, 3 August 2020

The <File_Area_Text> is a specific flavor of the more general <File_Area> class, designed for pointing to text files (like ReadMe files) associated with things like Bundles.

Note that the text file described by this class must have carriage-return/linefeed line delimiters.

<File>

REQUIRED

This class is handled identically in all the File_Area_* classes. It is described in detail in the Filling Out the File Class page.

<Stream_Text>

REQUIRED

This class provides metadata for the file referenced in the associated File class. There must be exactly one instance of this class in a File_Area_Text.

<name>

OPTIONAL

The name of the file itself is contained in the file_name attribute, which is required. So in general this attribute should only be used to provide something like a human-readable title for the contents of the file.

<local_identifier>

OPTIONAL

Use this if you need to create an identifier for this text data so you can reference it from other places in the label.

<offset>

REQUIRED

This is the offset, in bytes, into the file at which the text begins. This value should pretty much always be zero - talk to your PDS consultant if you have a case where you believe this isn't true. In any event, you must specify "bytes" as the unit for this attribute, thus:

    <offset unit="byte">0</offset>

<object_length>

OPTIONAL

This is the length of the text, in bytes. If the offset is zero, this should be the length of the file. You must specify "bytes" as the unit for this attribute, thus:

    <file_size unit="byte">1234567890</file_size>

<parsing_standard_id>

REQUIRED

The value identifies the text standard used in the file. It must be one of these:

  • 7-Bit ASCII Text if the file contains only ASCII characters
  • PDS3 if the file contains a PDS3 label
  • UTF-8 Text if the file contains UTF-8 characters
Note: Although the value PDS3 is valid here, you should almost certainly not be referencing PDS3 labels to document your PDS4 Bundle.

7-bit ASCII is a strict subset of UTF-8. In either case, avoid non-printing characters other than the line delimiters and the blank characters, especially if your text file depends on a fixed-space font to display correctly.

Note: You should be careful to ensure that your file actually contains only ASCII or UTF-8 characters. This can be a little tricky, especially on certain operating systems still using their own proprietary code pages as defaults for their very popular office software. If you don't know for certain that your editor is operating in UTF-8 mode, please check your settings and read the documentation before submitting a file with potentially spurious characters in it. This is really annoying to chase down and correct after the fact.

<description>

OPTIONAL

This attribute provides a place for free-format text comments on the text file, if any.

<record_delimiter>

REQUIRED

This must have the value Carriage-Return Line-Feed. The corresponding text must also have carriage-return/linefeed delimited lines.