Difference between revisions of "PDF/A in PDS4 - A Primer"

From The SBN Wiki
Jump to navigation Jump to search
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
PDS4-compliant formats for documents are “flat UTF-8” text, '''PDF/A-1a''' (which is preferred), or '''PDF/A-1b'''.<ref name="PDS4 Ref"/> This page will provide an overview of PDF/A-1a and PDF/A-1b in PDS4.
 
== PDF/A Overview ==
 
== PDF/A Overview ==
 
The Portable Document Format (PDF) is a broad, complex file format. The International Organization for Standardization (ISO) has created various PDF subset standards which are specialized for different uses. The PDF/A standard (ISO 19005) is designed for long-term preservation. There are three iterations of the PDF/A standard so far: PDF/A-1, PDF/A-2, and PDF/A-3. Versions of PDF/A retain backward but not forward compatibility (e.g., a document that complies with PDF/A-2 will also conform to PDF/A-3, but not necessarily to PDF/A-1). PDS4 requires compliance specifically to the first version, PDF/A-1.<ref>[https://pds.jpl.nasa.gov/policy/format_policies_final.pdf ''Policy on Formats for PDS4 Data and Documentation'']. (2014, June 30). The Planetary Data System. From [https://pds.jpl.nasa.gov/policy/ PDS Policies]</ref>
 
The Portable Document Format (PDF) is a broad, complex file format. The International Organization for Standardization (ISO) has created various PDF subset standards which are specialized for different uses. The PDF/A standard (ISO 19005) is designed for long-term preservation. There are three iterations of the PDF/A standard so far: PDF/A-1, PDF/A-2, and PDF/A-3. Versions of PDF/A retain backward but not forward compatibility (e.g., a document that complies with PDF/A-2 will also conform to PDF/A-3, but not necessarily to PDF/A-1). PDS4 requires compliance specifically to the first version, PDF/A-1.<ref>[https://pds.jpl.nasa.gov/policy/format_policies_final.pdf ''Policy on Formats for PDS4 Data and Documentation'']. (2014, June 30). The Planetary Data System. From [https://pds.jpl.nasa.gov/policy/ PDS Policies]</ref>
Line 4: Line 5:
 
=== PDF/A-1 ===
 
=== PDF/A-1 ===
 
PDF/A-1 (ISO 19005-1) is based on PDF Version 1.4, and imposes further specifications.
 
PDF/A-1 (ISO 19005-1) is based on PDF Version 1.4, and imposes further specifications.
: PDF/A-1 files '''must''' include:
+
 
:* Embedded fonts <ref name="FAQ">[http://www.npes.org/pdf/19005-1_FAQ.pdf ''19005-1 FAQ'']. (2006, July 10). PDF/A Joint Working Group. From [http://www.npes.org/programs/standardsworkroom/toolsbestpractices/pdfa.aspx NPES PDF/A FAQ]</ref>
+
PDF/A-1 files '''must''' include:
:* Device-independent color <ref name="FAQ"/>
+
* <span style="background:#E1FFFD">Embedded fonts</span> <ref name="FAQ">[http://www.npes.org/pdf/19005-1_FAQ.pdf ''19005-1 FAQ'']. (2006, July 10). PDF/A Joint Working Group. From [http://www.npes.org/programs/standardsworkroom/toolsbestpractices/pdfa.aspx NPES PDF/A FAQ]</ref>
:* XMP metadata <ref name="FAQ"/>
+
* <span style="background:#E1FFFD">Device-independent color</span> <ref name="FAQ"/>
:** XMP (Extensible Metadata Platform) is an ISO-standardized metadata model created by Adobe. A PDF file contains its metadata in <code><x:xmpmeta></code> tags in XML-based syntax. PDF/A-1 specifies information to be included in this structure, using both predefined XMP schemas and industry-specific XMP extension schemas.
+
* <span style="background:#E1FFFD">XMP metadata</span> <ref name="FAQ"/>
:** From PDFlib: "PDF/A-1 requires XMP for identifying conforming files and supports custom metadata through XMP extension schemas. XMP support in PDF/A-1 is based on the XMP 2004 specification."<ref>[https://www.pdflib.com/knowledge-base/xmp-metadata/ XMP Metadata]. (n.d.). Retrieved July 07, 2017.</ref>
+
** XMP (Extensible Metadata Platform) is an ISO-standardized metadata model created by Adobe. A PDF file contains its metadata in <code><x:xmpmeta></code> tags in XML-based syntax. PDF/A-1 specifies information to be included in this structure, using both predefined XMP schemas and industry-specific XMP extension schemas.
: PDF/A-1 files '''may not''' include:
+
** From PDFlib: "PDF/A-1 requires XMP for identifying conforming files and supports custom metadata through XMP extension schemas. XMP support in PDF/A-1 is based on the XMP 2004 specification."<ref>[https://www.pdflib.com/knowledge-base/xmp-metadata/ XMP Metadata]. (n.d.). Retrieved July 07, 2017.</ref>
:* Encryption <ref name="FAQ"/>
+
PDF/A-1 files '''may not''' include:
:* LZW Compression <ref name="FAQ"/>
+
* <span style="background:#FFE4E1">Encryption</span> <ref name="FAQ"/>
:** LZW is a proprietary, somewhat outmoded lossless data compression algorithm.
+
* <span style="background:#FFE4E1">LZW Compression</span> <ref name="FAQ"/>
:** Flate compression, which is more efficient and is in the public domain, is generally used instead of LZW.
+
** LZW is a proprietary, somewhat outmoded lossless data compression algorithm.
:** ZIP is allowed <ref name="PDFA FAQ">McAlearney, S. (n.d.). [https://www.pdfa.org/pdfa-faq/ PDF/A FAQ]. Retrieved July 11, 2017.</ref>
+
** Flate compression, which is more efficient and is in the public domain, is generally used instead of LZW.
:** JPEG is allowed,<ref name="PDFA FAQ"/> but not JPEG2000 <ref name="Dov">Isaacs, D. (2012, May 27). Re: What are the permitted compression techniques for PDF/A-1? Retrieved July 12, 2017, from https://forums.adobe.com/thread/1012639. Various discussion board replies by Adobe employee.</ref>
+
** ZIP is allowed <ref name="PDFA FAQ">McAlearney, S. (n.d.). [https://www.pdfa.org/pdfa-faq/ PDF/A FAQ]. Retrieved July 11, 2017.</ref>
:*** Note that though ISO standard recommends lossless compression, lossy compression isn't prohibited.<ref name="Dov"/>
+
** JPEG is allowed,<ref name="PDFA FAQ"/> but not JPEG2000 <ref name="Dov">Isaacs, D. (2012, May 27). Re: What are the permitted compression techniques for PDF/A-1? Retrieved July 12, 2017, from https://forums.adobe.com/thread/1012639. Various discussion board replies by Adobe employee.</ref>
:* Embedded files <ref name="FAQ"/>
+
*** Note that though the ISO standard recommends lossless compression, lossy compression isn't prohibited.<ref name="Dov"/>
:** This refers specifically to embedded file streams, which are used to embed contents of an external file within the body of the PDF.<ref name="EFS">Van der Knijff, V. (2013, January 9). [http://openpreservation.org/blog/2013/01/09/what-do-we-mean-embedded-files-pdf/ What do we mean by “embedded” files in PDF?] [Web log post]. Retrieved July 12, 2017.</ref>
+
* <span style="background:#FFE4E1">Embedded files</span> <ref name="FAQ"/>
:** Not to be confused with "embedded" images <ref name="EFS"/>
+
** This refers specifically to embedded file streams, which are used to embed contents of an external file within the body of the PDF.<ref name="EFS">Van der Knijff, V. (2013, January 9). [http://openpreservation.org/blog/2013/01/09/what-do-we-mean-embedded-files-pdf/ What do we mean by “embedded” files in PDF?] [Web log post]. Retrieved July 12, 2017.</ref>
:** From [[Creating and Validating PDF/A-1 Documents#veraPDF|veraPDF]]'s validation rules:
+
** Not to be confused with "embedded" images.<ref name="EFS"/> PDS4 Standards Reference states that figures may be "embedded" in PDFs;<ref name="PDS4 Ref">NASA, JPL. (2017). [https://pds.nasa.gov/pds4/doc/sr/current/StdRef_1.8.0_170321_clean.pdf Planetary Data System Standards Reference Version 1.8.0] (JPL D-7669, Part 2). Pasadena, CA: Jet Propulsion Laboratory.</ref> this isn't prohibited by PDF/A-1.
:: <blockquote>A file specification dictionary, as defined in PDF 3.10.2, shall not contain the EF key [...] This key is used to encapsulate files containing arbitrary content within a PDF file. The explicit prohibition of EF key has the implicit effect of disallowing embedded files that can create external dependencies and complicate preservation efforts.<ref>VeraPDF Consortium. (2016, December 23). [https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Part-1-rules PDF/A-1 validation rules]. Retrieved July 12, 2017.</ref></blockquote>
+
** From [[Creating and Validating PDF/A-1 Documents#veraPDF|veraPDF]]'s validation rules:
:* External content references <ref name="FAQ"/>
+
: <blockquote>A file specification dictionary, as defined in PDF 3.10.2, shall not contain the EF key [...] This key is used to encapsulate files containing arbitrary content within a PDF file. The explicit prohibition of EF key has the implicit effect of disallowing embedded files that can create external dependencies and complicate preservation efforts.<ref name="Rules">VeraPDF Consortium. (2016, December 23). [https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Part-1-rules PDF/A-1 validation rules]. Retrieved July 12, 2017.</ref></blockquote>
:* PDF Transparency <ref name="FAQ"/>
+
* <span style="background:#FFE4E1">External content references</span> <ref name="FAQ"/>
:* Multi-media <ref name="FAQ"/>
+
* <span style="background:#FFE4E1">PDF Transparency</span> <ref name="FAQ"/>
:* JavaScript <ref name="FAQ"/>
+
* <span style="background:#FFE4E1">Multi-media</span> <ref name="FAQ"/>
 +
* <span style="background:#FFE4E1">JavaScript</span> <ref name="FAQ"/>
 +
** From veraPDF: "JavaScript actions permit an arbitrary executable code that has the potential to interfere with reliable and predictable rendering."<ref name="Rules"/>
 +
 
  
 
There are two levels of conformance:
 
There are two levels of conformance:
:; PDF/A-1b — Level B (Basic) Conformance  
+
; PDF/A-1b — Level B (Basic) Conformance  
:* ''This is the minimum conformance level required for PDS4.''
+
* ''This is the minimum conformance level required for PDS4.''
:* encompasses ISO 19005-1 requirements "regarding the visual appearance of electronic documents, but not their structural or semantic properties" <ref name="ISO 19005-1">''ISO 19005-1:2005(en) Document management — Electronic document file format for long-term preservation — Part 1: Use of PDF 1.4 (PDF/A-1)''. (2005). Geneva: ISO. Preview only, retrieved from https://www.iso.org/obp/ui/#iso:std:iso:19005:-1:ed-1:v2:en.</ref>
+
* Encompasses ISO 19005-1 requirements "regarding the visual appearance of electronic documents, but not their structural or semantic properties" <ref name="ISO 19005-1">''ISO 19005-1:2005(en) Document management — Electronic document file format for long-term preservation — Part 1: Use of PDF 1.4 (PDF/A-1)''. (2005). Geneva: ISO. Preview only, retrieved from https://www.iso.org/obp/ui/#iso:std:iso:19005:-1:ed-1:v2:en.</ref>
:; PDF/A-1a — Level A (Accessible) Conformance
+
; PDF/A-1a — Level A (Accessible) Conformance
:* ''This is the much preferred conformance level for PDS4.''
+
* ''This is the much preferred conformance level for PDS4.''
:* encompasses all requirements of ISO 19005-1 <ref name="ISO 19005-1"/>
+
* Encompasses all requirements of ISO 19005-1 <ref name="ISO 19005-1"/>
:* Not only does this level better accommodate people with disabilities, it's also recommended over Level B "for an exact text searchability, text extraction, and for the reuse of content."<ref name="PDFA FAQ"/>
+
* Recommended over Level B "for an exact text searchability, text extraction, and for the reuse of content."<ref name="PDFA FAQ"/>
 +
 
  
 
Note that it's not immediately apparent whether a PDF file conforms to a standard. All PDF documents are defined by the same file format, after all, and thus they all use the <code>.pdf</code> filename extension. A document that is intentionally standard-compliant should, however, have XMP metadata which specifies its standard. A file which claims to conform to PDF/A-1a, for example, will include the following tags in the <code>pdfaid</code> namespace:
 
Note that it's not immediately apparent whether a PDF file conforms to a standard. All PDF documents are defined by the same file format, after all, and thus they all use the <code>.pdf</code> filename extension. A document that is intentionally standard-compliant should, however, have XMP metadata which specifies its standard. A file which claims to conform to PDF/A-1a, for example, will include the following tags in the <code>pdfaid</code> namespace:
Line 44: Line 49:
 
|}
 
|}
 
A PDF reader may make note of this designation. Adobe Acrobat, in particular, displays a blue banner which states that the file "claims compliance with the PDF/A standard." Regardless, this claim of compliance must be [[Creating and Validating PDF/A-1 Documents#Validation|validated]]. A non-compliant document may incorrectly have metadata indicating compliance; conversely, compliance metadata may be missing from an otherwise compliant document.
 
A PDF reader may make note of this designation. Adobe Acrobat, in particular, displays a blue banner which states that the file "claims compliance with the PDF/A standard." Regardless, this claim of compliance must be [[Creating and Validating PDF/A-1 Documents#Validation|validated]]. A non-compliant document may incorrectly have metadata indicating compliance; conversely, compliance metadata may be missing from an otherwise compliant document.
 +
 +
== Usage in PDS4 ==
 +
You may provide the same document in additional formats if helpful. However, all scientifically useful information in a supplemental format ''must'' be included in a PDS4-compliant format. If you create a PDF/A document from a Microsoft Word document, for example, then you may want to include the original Microsoft Word version as well.
 +
 +
The running list of PDS-approved supplemental formats for data and documentation are as follows:
 +
{| class="wikitable"
 +
!colspan="2"|Supplemental Formats
 +
|-
 +
!colspan="1"|Data
 +
!colspan="1"|Documentation
 +
|-
 +
|!rowspan="1"|
 +
* CCSDS Space Communications Protocols
 +
* GIF
 +
* J2C (JPEG2000 compressed image)
 +
* JPEG
 +
* PDF
 +
* PNG
 +
* SEED 2.4
 +
* TIFF
 +
|!rowspan="1"|
 +
* EPS (Encapsulated Postscript)
 +
* HTML 2.0, 3.2, 4.0, and 4.01
 +
* LaTEX
 +
* Microsoft Word
 +
* PDF
 +
* Postscript
 +
* Rich Text
 +
|}
 +
''Last update: 23 August 2017.''
  
 
== See Also ==
 
== See Also ==
Line 61: Line 96:
 
== References ==
 
== References ==
 
<references />
 
<references />
 +
 +
 +
 +
[[Category:PDF]]
 +
[[Category:Primers]]

Latest revision as of 15:58, 23 August 2017

PDS4-compliant formats for documents are “flat UTF-8” text, PDF/A-1a (which is preferred), or PDF/A-1b.[1] This page will provide an overview of PDF/A-1a and PDF/A-1b in PDS4.

PDF/A Overview

The Portable Document Format (PDF) is a broad, complex file format. The International Organization for Standardization (ISO) has created various PDF subset standards which are specialized for different uses. The PDF/A standard (ISO 19005) is designed for long-term preservation. There are three iterations of the PDF/A standard so far: PDF/A-1, PDF/A-2, and PDF/A-3. Versions of PDF/A retain backward but not forward compatibility (e.g., a document that complies with PDF/A-2 will also conform to PDF/A-3, but not necessarily to PDF/A-1). PDS4 requires compliance specifically to the first version, PDF/A-1.[2]

PDF/A-1

PDF/A-1 (ISO 19005-1) is based on PDF Version 1.4, and imposes further specifications.

PDF/A-1 files must include:

  • Embedded fonts [3]
  • Device-independent color [3]
  • XMP metadata [3]
    • XMP (Extensible Metadata Platform) is an ISO-standardized metadata model created by Adobe. A PDF file contains its metadata in <x:xmpmeta> tags in XML-based syntax. PDF/A-1 specifies information to be included in this structure, using both predefined XMP schemas and industry-specific XMP extension schemas.
    • From PDFlib: "PDF/A-1 requires XMP for identifying conforming files and supports custom metadata through XMP extension schemas. XMP support in PDF/A-1 is based on the XMP 2004 specification."[4]

PDF/A-1 files may not include:

  • Encryption [3]
  • LZW Compression [3]
    • LZW is a proprietary, somewhat outmoded lossless data compression algorithm.
    • Flate compression, which is more efficient and is in the public domain, is generally used instead of LZW.
    • ZIP is allowed [5]
    • JPEG is allowed,[5] but not JPEG2000 [6]
      • Note that though the ISO standard recommends lossless compression, lossy compression isn't prohibited.[6]
  • Embedded files [3]
    • This refers specifically to embedded file streams, which are used to embed contents of an external file within the body of the PDF.[7]
    • Not to be confused with "embedded" images.[7] PDS4 Standards Reference states that figures may be "embedded" in PDFs;[1] this isn't prohibited by PDF/A-1.
    • From veraPDF's validation rules:

A file specification dictionary, as defined in PDF 3.10.2, shall not contain the EF key [...] This key is used to encapsulate files containing arbitrary content within a PDF file. The explicit prohibition of EF key has the implicit effect of disallowing embedded files that can create external dependencies and complicate preservation efforts.[8]

  • External content references [3]
  • PDF Transparency [3]
  • Multi-media [3]
  • JavaScript [3]
    • From veraPDF: "JavaScript actions permit an arbitrary executable code that has the potential to interfere with reliable and predictable rendering."[8]


There are two levels of conformance:

PDF/A-1b — Level B (Basic) Conformance
  • This is the minimum conformance level required for PDS4.
  • Encompasses ISO 19005-1 requirements "regarding the visual appearance of electronic documents, but not their structural or semantic properties" [9]
PDF/A-1a — Level A (Accessible) Conformance
  • This is the much preferred conformance level for PDS4.
  • Encompasses all requirements of ISO 19005-1 [9]
  • Recommended over Level B "for an exact text searchability, text extraction, and for the reuse of content."[5]


Note that it's not immediately apparent whether a PDF file conforms to a standard. All PDF documents are defined by the same file format, after all, and thus they all use the .pdf filename extension. A document that is intentionally standard-compliant should, however, have XMP metadata which specifies its standard. A file which claims to conform to PDF/A-1a, for example, will include the following tags in the pdfaid namespace:

<pdfaid:part>1</pdfaid:part>
<pdfaid:conformance>A</pdfaid:conformance>

A PDF reader may make note of this designation. Adobe Acrobat, in particular, displays a blue banner which states that the file "claims compliance with the PDF/A standard." Regardless, this claim of compliance must be validated. A non-compliant document may incorrectly have metadata indicating compliance; conversely, compliance metadata may be missing from an otherwise compliant document.

Usage in PDS4

You may provide the same document in additional formats if helpful. However, all scientifically useful information in a supplemental format must be included in a PDS4-compliant format. If you create a PDF/A document from a Microsoft Word document, for example, then you may want to include the original Microsoft Word version as well.

The running list of PDS-approved supplemental formats for data and documentation are as follows:

Supplemental Formats
Data Documentation
  • CCSDS Space Communications Protocols
  • GIF
  • J2C (JPEG2000 compressed image)
  • JPEG
  • PDF
  • PNG
  • SEED 2.4
  • TIFF
  • EPS (Encapsulated Postscript)
  • HTML 2.0, 3.2, 4.0, and 4.01
  • LaTEX
  • Microsoft Word
  • PDF
  • Postscript
  • Rich Text

Last update: 23 August 2017.

See Also

External Links

General
Backend

References

  1. 1.0 1.1 NASA, JPL. (2017). Planetary Data System Standards Reference Version 1.8.0 (JPL D-7669, Part 2). Pasadena, CA: Jet Propulsion Laboratory.
  2. Policy on Formats for PDS4 Data and Documentation. (2014, June 30). The Planetary Data System. From PDS Policies
  3. 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 19005-1 FAQ. (2006, July 10). PDF/A Joint Working Group. From NPES PDF/A FAQ
  4. XMP Metadata. (n.d.). Retrieved July 07, 2017.
  5. 5.0 5.1 5.2 McAlearney, S. (n.d.). PDF/A FAQ. Retrieved July 11, 2017.
  6. 6.0 6.1 Isaacs, D. (2012, May 27). Re: What are the permitted compression techniques for PDF/A-1? Retrieved July 12, 2017, from https://forums.adobe.com/thread/1012639. Various discussion board replies by Adobe employee.
  7. 7.0 7.1 Van der Knijff, V. (2013, January 9). What do we mean by “embedded” files in PDF? [Web log post]. Retrieved July 12, 2017.
  8. 8.0 8.1 VeraPDF Consortium. (2016, December 23). PDF/A-1 validation rules. Retrieved July 12, 2017.
  9. 9.0 9.1 ISO 19005-1:2005(en) Document management — Electronic document file format for long-term preservation — Part 1: Use of PDF 1.4 (PDF/A-1). (2005). Geneva: ISO. Preview only, retrieved from https://www.iso.org/obp/ui/#iso:std:iso:19005:-1:ed-1:v2:en.