JPEG 2000 as a Preservation Format for Digital Raster Images at Library and Archives Canada
JPEG2000 Preservation File Format Working Group
Library and Archives Canada
June 2008
This paper proposes that Library and Archives Canada (LAC) adopt baseline JPEG2000 as a preservation format for digital raster images. In support of this proposal, JPEG2000 is described in summary and assessed as a partial solution to LAC's digital storage challenges, while offering added access features and bandwidth savings. The format is also weighed against preservation format criteria established by LAC to ensure the long-term safeguarding of the organization's digital image collection. A bibliography and environmental scan of peer memory institutions are included as appendices.
The need to find a viable alternative to large, uncompressed TIFF files as a preservation format for digital raster images is a growing concern for Library and Archives Canada (LAC). The availability of adequate storage space is always a critical issue, and LAC's digital collection is poised to expand dramatically as the organization initiates large-scale digitization efforts to expand online access to its analog holdings. LAC is not alone in this regard, and other memory institutions internationally – among them the National Library of the Netherlands and the National Library of Norway – have investigated alternative, more storage-efficient file formats to use in place of the industry standard TIFF.
The comparably recent JPEG2000 file standard, from the same joint subcommittee that published the original JPEG standard in 1992, offers a promising balance between preservation-quality lossless compression and substantially reduced storage space (typically a 2:1 ratio for lossless compression compared to TIFF with no loss to image quality1). In addition, the added ability to 'extract on demand' lower resolution images from a single, lossless master offers further potential for storage efficiency by eliminating the need to store multiple copies of the same image at varying resolutions for both preservation and access purposes.
JPEG2000 offers other attractive features for collection dissemination as well. Progressive transmission, for example, allows a low resolution image to be displayed to a recipient once a fraction of the whole file has been transmitted, gradually increasing in refinement until the full resolution is displayed. As well, region of interest decoding (ROI) permits portions of an image to be viewed at higher resolutions than others, allowing for increased performance inasmuch as the entire file need not be altered to a sharper resolution with each progressive 'zoom.' Such decoding flexibilities promise substantial bandwidth savings during file transmission, as well as an improved overall user experience.
Dissemination features and storage savings, however, must be balanced with the requirement to safeguard the integrity of digital information assets for future generations, and any consideration of alternate preservation file formats must be weighed against clear criteria sets to ensure the longevity of preserved images.
The purpose of this paper is to propose that JPEG2000 be adopted as a preservation format for digital raster images at LAC, through a balanced consideration of the format's advantages and the organization's needs.
JPEG 2000 is a comparably new image compression standard based on advances in wavelet technology (see www.jpeg.org). The standard was developed by the Joint Photographic Experts Group (JPEG) as a subcommittee of the International Organization for Standardization (ISO), the International Electrotechnical Commission (IEC), and the ITU Telecommunication Standardization Sector (ITU-T). The Joint Photographic Experts Group is the same committee who published the now ubiquitous JPEG standard in 1992, but with a different set of international commercial and academic participants.
The impetus for a new JPEG standard arose from a desire to resolve many of the limitations of the original standard while recognizing broadening areas of application for JPEG technology. With this in mind, the group set out to create a new standard in accordance with the following basic objectives:
The desire to create an option for lossless compression deserves particular attention for those interested in long-term preservation. The original baseline JPEG is "lossy," implying that an image, once compressed, cannot be recovered exactly to its uncompressed state. Though irreversible, for the most part the resulting differences are minute and visually unnoticeable, or "visually lossless." In some cases, however (image preservation being one of them), a truly "lossless" compression is desired, such that a compressed image can be recovered – bit for bit – to its original pre-compressed state.
JPEG published its first draft specification for JPEG2000 in 1999, which to many represented not simply an upgrade to the previous format, but a new standard altogether. The standard is divided into twelve parts, most of which have variously followed the formal process of standardization through ISO/IEC. A short description of each follows:
While advances in technology promise vastly increased opportunities for access to and dissemination of information, the rate of hardware and software obsolescence is alarming from a file preservation point of view.2 Thus, clear criteria must be established to ensure that file formats selected for digital preservation will permit long-term, undiminished access to digital information assets. Upon a review of similar criteria sets published by the Library of Congress, the National Archives (UK), and the National Library of the Netherlands,3 Library and Archives Canada has established the following five criteria, which represent common threads in the sets produced by the aforementioned institutions.
In addition to these five criteria, further corporate considerations include:
JPEG2000's relative compliance to LAC's criteria is ranked low, medium, or high in accordance with each a factor:
From an implementation perspective, JPEG2000 performance and the organizational preparedness of Library and Archives Canada deserve consideration:
A scan of JPEG2000's adoption in peer cultural institutions reveals some noteworthy trends. For one, the format has evidently not been widely adopted by cultural memory institutions as a preservation format for still raster images. Of the institutions reviewed, only the British Library, the National Library of Norway, and Smithsonian Libraries appear to use JPEG2000 for preservation purposes. Others, however, including the National Diet Library of Japan, the National Library of the Netherlands, and the State Library of Queensland are in similar stages to Library and Archives Canada in examining or providing recommendations regarding JPEG2000 for image preservation. Conversely, the format is widely used as a web access format, though most institutions choose to retain TIFF masters in accordance with industry standard.
In the area of film preservation, however, motion JPEG2000 has had a more substantial impact, most notably by way of the Digital Cinema Institute's introduction of motion JPEG2000 (MJ2 or MJP2) as an industry standard for digital cinema compression. Other industries that have adopted JPEG2000 to various extents include military imaging, criminal investigation, and geospacial imagery.
LAC's JPEG2000 Preservation File Format Working Group recommends that the organization adopt JPEG2000 part 1 (baseline) as a preservation file format for digital raster images.
From the perspective of the working group, JPEG2000 represents an appropriate balance between strategic corporate considerations and criteria established to ensure the long-term preservation of digital image assets (including integration into LAC's developing Trusted Digital Repository). As LAC's image collection expands – a trend that is expected to increase as plans to digitize large analog collections are realized – available storage space will diminish beyond already critical levels. Increased storage efficiency, then, makes JPEG2000 an attractive alternative to TIFF. Additionally, added features such as progressive transmission and region of interest decoding promise to reduce bandwidth requirements while increasing access flexibility. The format scores highly against LAC preservation format criteria of openness and standardization, and scores adequately with regard to uptake among peer cultural institutions, stability/compatibility, and dependencies/interoperability. The biggest risks to LAC's adoption of the format would appear to be a lack of industry uptake for preservation purposes, backward incompatibility, and a lack of native browser support. However, it is the working group's assumption that JPEG2000s ubiquity and interoperability will continue to expand over time, and that the risks currently posed are not of a criticality to exclude the format as a preservation option.
1 See Gillesse et al (2008), Buckley (2008), Bernier (2006), Janosky & Witthus (2003)
2 The UNESCO Charter on the Preservation of Digital Heritage cites rapid hardware and software obsolescence as a key factor in putting the world's digital heritage at risk. (http://portal.unesco.org/ci/en/ev.php-URL_ID=13367&URL_DO=DO_TOPIC&URL_SECTION=201.html)
3 See Gillesse et al 2008; Rauch, Carl et al. 'File-Formats for Preservation: Evaluating the Long-Term Stability of File-Formats." Proceedings ELPUB2007 Conference on Electronic Publishing : Vienna, Austria , 2007. http://elpub.scix.net/data/works/att/122_elpub2007.content.pdf; National Archives (UK). "Selecting File Formats for Long-Term Preservation." (2003). http://www.nationalarchives.gov.uk/documents/selecting_file_formats.rtf; Library of Congress. "Sustainability of Digital Formats: Planning for Library of Congress Collections." (2007). http://www.digitalpreservation.gov/formats/sustain/sustain.shtml.
4 See Murray (2004).
5 See Gillesse et al (2008).
6 See Chai & Bouzerdoum (2001), Gormish (1999).
7 See Gillesse et al (2008), Buckley (2008), Bernier (2006), Janosky & Witthus (2003).
8 See Yale (2008), Murray (2004).
9 See appendix B