PDF/A in PDS4 - A Primer
PDF/A Overview
The Portable Document Format (PDF) is a broad, complex file format. The International Organization for Standardization (ISO) has created various PDF subset standards which are specialized for different uses. The PDF/A standard (ISO 19005) is designed for long-term preservation. There are three iterations of the PDF/A standard so far: PDF/A-1, PDF/A-2, and PDF/A-3. Versions of PDF/A retain backward but not forward compatibility (e.g., a document that complies with PDF/A-2 will also conform to PDF/A-3, but not necessarily to PDF/A-1). PDS4 requires compliance specifically to the first version, PDF/A-1.[1]
PDF/A-1
PDF/A-1 (ISO 19005-1) is based on PDF Version 1.4, and imposes further specifications.
PDF/A-1 files must include:
- Embedded fonts [2]
- Device-independent color [2]
- XMP metadata [2]
- XMP (Extensible Metadata Platform) is an ISO-standardized metadata model created by Adobe. A PDF file contains its metadata in
<x:xmpmeta>
tags in XML-based syntax. PDF/A-1 specifies information to be included in this structure, using both predefined XMP schemas and industry-specific XMP extension schemas. - From PDFlib: "PDF/A-1 requires XMP for identifying conforming files and supports custom metadata through XMP extension schemas. XMP support in PDF/A-1 is based on the XMP 2004 specification."[3]
- XMP (Extensible Metadata Platform) is an ISO-standardized metadata model created by Adobe. A PDF file contains its metadata in
PDF/A-1 files may not include:
- Encryption [2]
- LZW Compression [2]
- LZW is a proprietary, somewhat outmoded lossless data compression algorithm.
- Flate compression, which is more efficient and is in the public domain, is generally used instead of LZW.
- ZIP is allowed [4]
- JPEG is allowed,[4] but not JPEG2000 [5]
- Note that though the ISO standard recommends lossless compression, lossy compression isn't prohibited.[5]
- Embedded files [2]
- This refers specifically to embedded file streams, which are used to embed contents of an external file within the body of the PDF.[6]
- Not to be confused with "embedded" images.[6] PDS4 Standards Reference states that figures may be "embedded" in PDFs;[7] this isn't prohibited by PDF/A-1.
- From veraPDF's validation rules:
A file specification dictionary, as defined in PDF 3.10.2, shall not contain the EF key [...] This key is used to encapsulate files containing arbitrary content within a PDF file. The explicit prohibition of EF key has the implicit effect of disallowing embedded files that can create external dependencies and complicate preservation efforts.[8]
- External content references [2]
- PDF Transparency [2]
- Multi-media [2]
- JavaScript [2]
- From veraPDF: "JavaScript actions permit an arbitrary executable code that has the potential to interfere with reliable and predictable rendering."[8]
There are two levels of conformance:
- PDF/A-1b — Level B (Basic) Conformance
- This is the minimum conformance level required for PDS4.
- encompasses ISO 19005-1 requirements "regarding the visual appearance of electronic documents, but not their structural or semantic properties" [9]
- PDF/A-1a — Level A (Accessible) Conformance
- This is the much preferred conformance level for PDS4.
- encompasses all requirements of ISO 19005-1 [9]
- Not only does this level better accommodate people with disabilities, it's also recommended over Level B "for an exact text searchability, text extraction, and for the reuse of content."[4]
Note that it's not immediately apparent whether a PDF file conforms to a standard. All PDF documents are defined by the same file format, after all, and thus they all use the .pdf
filename extension. A document that is intentionally standard-compliant should, however, have XMP metadata which specifies its standard. A file which claims to conform to PDF/A-1a, for example, will include the following tags in the pdfaid
namespace:
<pdfaid:part>1</pdfaid:part> <pdfaid:conformance>A</pdfaid:conformance> |
A PDF reader may make note of this designation. Adobe Acrobat, in particular, displays a blue banner which states that the file "claims compliance with the PDF/A standard." Regardless, this claim of compliance must be validated. A non-compliant document may incorrectly have metadata indicating compliance; conversely, compliance metadata may be missing from an otherwise compliant document.
See Also
External Links
- General
- Backend
References
- ↑ Policy on Formats for PDS4 Data and Documentation. (2014, June 30). The Planetary Data System. From PDS Policies
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 19005-1 FAQ. (2006, July 10). PDF/A Joint Working Group. From NPES PDF/A FAQ
- ↑ XMP Metadata. (n.d.). Retrieved July 07, 2017.
- ↑ 4.0 4.1 4.2 McAlearney, S. (n.d.). PDF/A FAQ. Retrieved July 11, 2017.
- ↑ 5.0 5.1 Isaacs, D. (2012, May 27). Re: What are the permitted compression techniques for PDF/A-1? Retrieved July 12, 2017, from https://forums.adobe.com/thread/1012639. Various discussion board replies by Adobe employee.
- ↑ 6.0 6.1 Van der Knijff, V. (2013, January 9). What do we mean by “embedded” files in PDF? [Web log post]. Retrieved July 12, 2017.
- ↑ NASA, JPL. (2017). Planetary Data System Standards Reference Version 1.8.0 (JPL D-7669, Part 2). Pasadena, CA: Jet Propulsion Laboratory.
- ↑ 8.0 8.1 VeraPDF Consortium. (2016, December 23). PDF/A-1 validation rules. Retrieved July 12, 2017.
- ↑ 9.0 9.1 ISO 19005-1:2005(en) Document management — Electronic document file format for long-term preservation — Part 1: Use of PDF 1.4 (PDF/A-1). (2005). Geneva: ISO. Preview only, retrieved from https://www.iso.org/obp/ui/#iso:std:iso:19005:-1:ed-1:v2:en.