Difference between revisions of "PDF/A in PDS4 - A Primer"
m (→PDF/A-1) |
|||
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | PDS4-compliant formats for documents are “flat UTF-8” text, '''PDF/A-1a''' (which is preferred), or '''PDF/A-1b'''.<ref name="PDS4 Ref"/> This page will provide an overview of PDF/A-1a and PDF/A-1b in PDS4. | ||
== PDF/A Overview == | == PDF/A Overview == | ||
The Portable Document Format (PDF) is a broad, complex file format. The International Organization for Standardization (ISO) has created various PDF subset standards which are specialized for different uses. The PDF/A standard (ISO 19005) is designed for long-term preservation. There are three iterations of the PDF/A standard so far: PDF/A-1, PDF/A-2, and PDF/A-3. Versions of PDF/A retain backward but not forward compatibility (e.g., a document that complies with PDF/A-2 will also conform to PDF/A-3, but not necessarily to PDF/A-1). PDS4 requires compliance specifically to the first version, PDF/A-1.<ref>[https://pds.jpl.nasa.gov/policy/format_policies_final.pdf ''Policy on Formats for PDS4 Data and Documentation'']. (2014, June 30). The Planetary Data System. From [https://pds.jpl.nasa.gov/policy/ PDS Policies]</ref> | The Portable Document Format (PDF) is a broad, complex file format. The International Organization for Standardization (ISO) has created various PDF subset standards which are specialized for different uses. The PDF/A standard (ISO 19005) is designed for long-term preservation. There are three iterations of the PDF/A standard so far: PDF/A-1, PDF/A-2, and PDF/A-3. Versions of PDF/A retain backward but not forward compatibility (e.g., a document that complies with PDF/A-2 will also conform to PDF/A-3, but not necessarily to PDF/A-1). PDS4 requires compliance specifically to the first version, PDF/A-1.<ref>[https://pds.jpl.nasa.gov/policy/format_policies_final.pdf ''Policy on Formats for PDS4 Data and Documentation'']. (2014, June 30). The Planetary Data System. From [https://pds.jpl.nasa.gov/policy/ PDS Policies]</ref> | ||
Line 34: | Line 35: | ||
; PDF/A-1b — Level B (Basic) Conformance | ; PDF/A-1b — Level B (Basic) Conformance | ||
* ''This is the minimum conformance level required for PDS4.'' | * ''This is the minimum conformance level required for PDS4.'' | ||
− | * | + | * Encompasses ISO 19005-1 requirements "regarding the visual appearance of electronic documents, but not their structural or semantic properties" <ref name="ISO 19005-1">''ISO 19005-1:2005(en) Document management — Electronic document file format for long-term preservation — Part 1: Use of PDF 1.4 (PDF/A-1)''. (2005). Geneva: ISO. Preview only, retrieved from https://www.iso.org/obp/ui/#iso:std:iso:19005:-1:ed-1:v2:en.</ref> |
; PDF/A-1a — Level A (Accessible) Conformance | ; PDF/A-1a — Level A (Accessible) Conformance | ||
* ''This is the much preferred conformance level for PDS4.'' | * ''This is the much preferred conformance level for PDS4.'' | ||
− | * | + | * Encompasses all requirements of ISO 19005-1 <ref name="ISO 19005-1"/> |
− | * | + | * Recommended over Level B "for an exact text searchability, text extraction, and for the reuse of content."<ref name="PDFA FAQ"/> |
+ | |||
Note that it's not immediately apparent whether a PDF file conforms to a standard. All PDF documents are defined by the same file format, after all, and thus they all use the <code>.pdf</code> filename extension. A document that is intentionally standard-compliant should, however, have XMP metadata which specifies its standard. A file which claims to conform to PDF/A-1a, for example, will include the following tags in the <code>pdfaid</code> namespace: | Note that it's not immediately apparent whether a PDF file conforms to a standard. All PDF documents are defined by the same file format, after all, and thus they all use the <code>.pdf</code> filename extension. A document that is intentionally standard-compliant should, however, have XMP metadata which specifies its standard. A file which claims to conform to PDF/A-1a, for example, will include the following tags in the <code>pdfaid</code> namespace: | ||
Line 47: | Line 49: | ||
|} | |} | ||
A PDF reader may make note of this designation. Adobe Acrobat, in particular, displays a blue banner which states that the file "claims compliance with the PDF/A standard." Regardless, this claim of compliance must be [[Creating and Validating PDF/A-1 Documents#Validation|validated]]. A non-compliant document may incorrectly have metadata indicating compliance; conversely, compliance metadata may be missing from an otherwise compliant document. | A PDF reader may make note of this designation. Adobe Acrobat, in particular, displays a blue banner which states that the file "claims compliance with the PDF/A standard." Regardless, this claim of compliance must be [[Creating and Validating PDF/A-1 Documents#Validation|validated]]. A non-compliant document may incorrectly have metadata indicating compliance; conversely, compliance metadata may be missing from an otherwise compliant document. | ||
+ | |||
+ | == Usage in PDS4 == | ||
+ | You may provide the same document in additional formats if helpful. However, all scientifically useful information in a supplemental format ''must'' be included in a PDS4-compliant format. If you create a PDF/A document from a Microsoft Word document, for example, then you may want to include the original Microsoft Word version as well. | ||
+ | |||
+ | The running list of PDS-approved supplemental formats for data and documentation are as follows: | ||
+ | {| class="wikitable" | ||
+ | !colspan="2"|Supplemental Formats | ||
+ | |- | ||
+ | !colspan="1"|Data | ||
+ | !colspan="1"|Documentation | ||
+ | |- | ||
+ | |!rowspan="1"| | ||
+ | * CCSDS Space Communications Protocols | ||
+ | * GIF | ||
+ | * J2C (JPEG2000 compressed image) | ||
+ | * JPEG | ||
+ | * PDF | ||
+ | * PNG | ||
+ | * SEED 2.4 | ||
+ | * TIFF | ||
+ | |!rowspan="1"| | ||
+ | * EPS (Encapsulated Postscript) | ||
+ | * HTML 2.0, 3.2, 4.0, and 4.01 | ||
+ | * LaTEX | ||
+ | * Microsoft Word | ||
+ | * PDF | ||
+ | * Postscript | ||
+ | * Rich Text | ||
+ | |} | ||
+ | ''Last update: 23 August 2017.'' | ||
== See Also == | == See Also == | ||
Line 64: | Line 96: | ||
== References == | == References == | ||
<references /> | <references /> | ||
+ | |||
+ | |||
+ | |||
+ | [[Category:PDF]] | ||
+ | [[Category:Primers]] |
Latest revision as of 15:58, 23 August 2017
PDS4-compliant formats for documents are “flat UTF-8” text, PDF/A-1a (which is preferred), or PDF/A-1b.[1] This page will provide an overview of PDF/A-1a and PDF/A-1b in PDS4.
PDF/A Overview
The Portable Document Format (PDF) is a broad, complex file format. The International Organization for Standardization (ISO) has created various PDF subset standards which are specialized for different uses. The PDF/A standard (ISO 19005) is designed for long-term preservation. There are three iterations of the PDF/A standard so far: PDF/A-1, PDF/A-2, and PDF/A-3. Versions of PDF/A retain backward but not forward compatibility (e.g., a document that complies with PDF/A-2 will also conform to PDF/A-3, but not necessarily to PDF/A-1). PDS4 requires compliance specifically to the first version, PDF/A-1.[2]
PDF/A-1
PDF/A-1 (ISO 19005-1) is based on PDF Version 1.4, and imposes further specifications.
PDF/A-1 files must include:
- Embedded fonts [3]
- Device-independent color [3]
- XMP metadata [3]
- XMP (Extensible Metadata Platform) is an ISO-standardized metadata model created by Adobe. A PDF file contains its metadata in
<x:xmpmeta>
tags in XML-based syntax. PDF/A-1 specifies information to be included in this structure, using both predefined XMP schemas and industry-specific XMP extension schemas. - From PDFlib: "PDF/A-1 requires XMP for identifying conforming files and supports custom metadata through XMP extension schemas. XMP support in PDF/A-1 is based on the XMP 2004 specification."[4]
- XMP (Extensible Metadata Platform) is an ISO-standardized metadata model created by Adobe. A PDF file contains its metadata in
PDF/A-1 files may not include:
- Encryption [3]
- LZW Compression [3]
- LZW is a proprietary, somewhat outmoded lossless data compression algorithm.
- Flate compression, which is more efficient and is in the public domain, is generally used instead of LZW.
- ZIP is allowed [5]
- JPEG is allowed,[5] but not JPEG2000 [6]
- Note that though the ISO standard recommends lossless compression, lossy compression isn't prohibited.[6]
- Embedded files [3]
- This refers specifically to embedded file streams, which are used to embed contents of an external file within the body of the PDF.[7]
- Not to be confused with "embedded" images.[7] PDS4 Standards Reference states that figures may be "embedded" in PDFs;[1] this isn't prohibited by PDF/A-1.
- From veraPDF's validation rules:
A file specification dictionary, as defined in PDF 3.10.2, shall not contain the EF key [...] This key is used to encapsulate files containing arbitrary content within a PDF file. The explicit prohibition of EF key has the implicit effect of disallowing embedded files that can create external dependencies and complicate preservation efforts.[8]
- External content references [3]
- PDF Transparency [3]
- Multi-media [3]
- JavaScript [3]
- From veraPDF: "JavaScript actions permit an arbitrary executable code that has the potential to interfere with reliable and predictable rendering."[8]
There are two levels of conformance:
- PDF/A-1b — Level B (Basic) Conformance
- This is the minimum conformance level required for PDS4.
- Encompasses ISO 19005-1 requirements "regarding the visual appearance of electronic documents, but not their structural or semantic properties" [9]
- PDF/A-1a — Level A (Accessible) Conformance
- This is the much preferred conformance level for PDS4.
- Encompasses all requirements of ISO 19005-1 [9]
- Recommended over Level B "for an exact text searchability, text extraction, and for the reuse of content."[5]
Note that it's not immediately apparent whether a PDF file conforms to a standard. All PDF documents are defined by the same file format, after all, and thus they all use the .pdf
filename extension. A document that is intentionally standard-compliant should, however, have XMP metadata which specifies its standard. A file which claims to conform to PDF/A-1a, for example, will include the following tags in the pdfaid
namespace:
<pdfaid:part>1</pdfaid:part> <pdfaid:conformance>A</pdfaid:conformance> |
A PDF reader may make note of this designation. Adobe Acrobat, in particular, displays a blue banner which states that the file "claims compliance with the PDF/A standard." Regardless, this claim of compliance must be validated. A non-compliant document may incorrectly have metadata indicating compliance; conversely, compliance metadata may be missing from an otherwise compliant document.
Usage in PDS4
You may provide the same document in additional formats if helpful. However, all scientifically useful information in a supplemental format must be included in a PDS4-compliant format. If you create a PDF/A document from a Microsoft Word document, for example, then you may want to include the original Microsoft Word version as well.
The running list of PDS-approved supplemental formats for data and documentation are as follows:
Supplemental Formats | |
---|---|
Data | Documentation |
|
|
Last update: 23 August 2017.
See Also
External Links
- General
- Backend
References
- ↑ 1.0 1.1 NASA, JPL. (2017). Planetary Data System Standards Reference Version 1.8.0 (JPL D-7669, Part 2). Pasadena, CA: Jet Propulsion Laboratory.
- ↑ Policy on Formats for PDS4 Data and Documentation. (2014, June 30). The Planetary Data System. From PDS Policies
- ↑ 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 19005-1 FAQ. (2006, July 10). PDF/A Joint Working Group. From NPES PDF/A FAQ
- ↑ XMP Metadata. (n.d.). Retrieved July 07, 2017.
- ↑ 5.0 5.1 5.2 McAlearney, S. (n.d.). PDF/A FAQ. Retrieved July 11, 2017.
- ↑ 6.0 6.1 Isaacs, D. (2012, May 27). Re: What are the permitted compression techniques for PDF/A-1? Retrieved July 12, 2017, from https://forums.adobe.com/thread/1012639. Various discussion board replies by Adobe employee.
- ↑ 7.0 7.1 Van der Knijff, V. (2013, January 9). What do we mean by “embedded” files in PDF? [Web log post]. Retrieved July 12, 2017.
- ↑ 8.0 8.1 VeraPDF Consortium. (2016, December 23). PDF/A-1 validation rules. Retrieved July 12, 2017.
- ↑ 9.0 9.1 ISO 19005-1:2005(en) Document management — Electronic document file format for long-term preservation — Part 1: Use of PDF 1.4 (PDF/A-1). (2005). Geneva: ISO. Preview only, retrieved from https://www.iso.org/obp/ui/#iso:std:iso:19005:-1:ed-1:v2:en.