In all domains the amount of digital information is increasing at a rapid rate, this raises crucial questions of preservation. Our intellectual capital, as laid down in educational, scientific, public, cultural and other intellectual resources, is increasingly at risk by the volatile character of digital objects and the rapid developments in information technology.
The growing need for adequate management and preservation of digital information is recognized in many stakeholder communities. The Library and Archives Canada (LAC) is committed to being a Trusted Digital Repository (TDR), to provide reliable and long-term access to the digital documentary heritage of Canada.
Increasingly, the documentary heritage of Canada is being born digital and made accessible to Canadians in digital form. The rapid move to a digital environment has changed everything the LAC mandate touches - publishing, government, research, learning, and culture. LAC has therefore set as a primary objective to become a truly digital institution.
The LAC acquires a large scale and broad range of digital content including digital publications, selective websites, large web domains, blogs, electronic government records, digital photos and art, digital audio-visual, geomatics, electronic theses from Canadian universities, digital technical and architectural drawings, private textual electronic records, broadcast data etc. As well, the LAC generates considerable digital content with a large-scale digitization program.
The Virtual Loading Dock (VLD) is the first step towards the implementation of LAC's TDR.It addresses the OAIS [1] ingest services and is the gateway to the TDR, intended to eventually capture all digital content ingested by LAC. This paper describes the VLD application and the role it plays in assisting the LAC meet its mandate for Digital Preservation. It further describes the business requirements for Legal Deposit and challenges faced by LAC with respect to Digital Preservation. Finally the paper describes the technical design and implementation of the VLD within the LAC TDR.
LAC is guided by its mandate to preserve the documentary heritage of Canada for the benefit of present and future generations, to be a source of enduring knowledge accessible to all, and to serve as the continuing memory of the Government of Canada and its institutions.
Preservation of the LAC collection of digital materials is based on the broad mandate established by the Library and Archives of Canada Act http://laws.justice.gc.ca/en/L-7.7/80647.html. It is governed by the more specific powers outlined in the legislation that relate to the transfer of government and ministerial records of historical or archival value and the transfer of government records at risk, the powers that relate to the Legal Deposit of online publications and the representative sampling of the Internet, and by the provisions in the accompanying Legal Deposit of Publications Regulations.
The rapidly growing collection of digital and digitized content at LAC needs to be properly managed within a comprehensive set of processes, tools and repositories. Some of the key challenges facing the LAC include;
Recognizing the challenges of digital preservation has propelled the LAC to adopt a new business framework; the TDR.The LAC has committed to a multi-year project to develop a suite of TDR business and technology services to establish a reliable, flexible, integrated digital preservation infrastructure. The LAC TDR is based on the OAIS reference model [1]; it will provide a set of trusted services that provide reliable and persistent access to, along with reliable storage and long-term preservation of the digital collections at LAC.
The first step towards the implementation of the LAC TDR is the development of the Virtual Loading Dock (VLD) to enable the capture (ingest) of digital assets. Over the long-term, the intent is for the VLD to capture and ingest all born digital and digitized assets into the LAC collections - whether they are submitted by suppliers, publishers, government departments or donors on physical media, by email, by electronic transfer (FTP, OAI harvester), by web form, or manually collected by LAC staff.

Figure 1. TDR High-Level Design
The development of these trusted services closely follows international standards, best practices and guidelines for ensuring the integrity, authenticity and ability to view digital assets within the trusted digital repositories. Following are the metadata standards as well as open protocols and tools currently in use within the implementation of LAC's TDR.
Metadata standards
Open protocols and tools
The high-level design of LAC's TDR is depicted in Figure 1. The highlighted area indicates the scope of the VLD.
The OAIS Submission Information Package (SIP) processing is the overall process flow for this iteration of the VLD; addressing LAC's requirements for legal deposit and digital published heritage. The VLD is designed to receive digital assets, validate the integrity of the assets, extract technical and descriptive metadata about the assets and prepare the SIP. A SIP is comprised of one or more digital object files and the metadata describing those files within a standards-based representation. Assets are stored in the VLD until they are appraised to determine if they should be part of LAC's digital collection or should be discarded.
The solution is built as custom components and configurations developed using an underlying base of commercial off-the-shelf software products and leveraging open-source technologies. The solution manages the lifecycle of digital assets in three distinct phases:
The phases are mapped to high-level functional processes within the solution as shown in the VLD Functional Component Architecture diagram in Figure 2. The current iteration of the VLD solution is addressing the ingestion workflow for digital publications.
VLD Functional Components
The following are brief descriptions of the various VLD functional components depicted in Figure 2.
1 Ingest Manager
2 SIP Processing Module
3 Ingestion Connectors
4, 5 METS Transformation Module (Metadata Handling)
4, 5 METS Update Module (Metadata Handling)

Figure 2. Functional Component Architecture
6 Workflow Framework
Controls the workflows which will be used throughout the VLD. The design goal of this module will be to control the main processing flow of an asset from ingestion to final SIP storage.
7 Tools Framework
Provides a repository for, and APIs to query, information concerning the various tools available to perform specific tasks within the VLD. This framework allows for tools to be added and replaced within the tools execution framework with a minimum of recoding effort. VLD tools should, wherever possible, be open-source, 3rd party tools with no custom coding. The tools framework is comprised of the following components:
The following tools are currently in use:

Figure 3. Tools Execution Framework
8 Virtual Loading Dock (VLD) Store Module
9 Metadata Repository
10 Quality Assurance (QA) Services
11 Transfer Mechanism Services
12 Supplier Framework
13 Receiving Zone
14 Secure Zone
15 Storage Services
16 Reporting Services
Integration of the VLD Functional Components
The following sections and figures further explain the integration between the different functional components of the VLD solution.
Ingest Manager, Ingestion Connectors and the Tools Framework.

Figure 4. Ingestion Connectors
Metadata / METS Handling
The diagram in Figure 5, illustrates the sources and flows of metadata through the system. It explains the original sources of metadata through the online forms as well as system-extracted metadata. It also serves to demonstrate the data flow that results when an LAC Internet Unit staff member edits metadata for a specific asset, and the resulting update in METS.
Metadata regarding a publication asset is derived from the following sources; Publisher Profile, Publication Profile and Extracted Metadata.

Figure 5. Metadata / METS Handling
Extracted Metadata is processed to extract applicable XMP or MIX data for METS Descriptive sections. Other specific elements will be driven by the Publication Publisher Profile. The purpose of the METS transformation module is to collect the various metadata elements from the discrete repositories and map that metadata into the LAC implementation of the METS schema.
The LAC METS implementation tracks the metadata and events surrounding intellectual entities (a book or serial publication) and the physical files that comprise those intellectual entities. Publication profiles and Publisher profiles will also feed this METS profile.
The generation of a METS record takes place after automated metadata extraction and (before/after) manual validation of an asset.
The modules required to consolidate and assemble the METS record are being coded in PERL to make use of its native file handling, excellent XML processing capabilities and small execution footprint. Since every asset that flows through VLD will require at least 1 METS file (for physical attributes) and potentially 2 (for intellectual attributes), the recommended approach does not included Java processing in order to remove the overhead of initiating the JVM.
Contact Information Management Integration
LAC has a centralized contact information management system (CIM) to provision for integrated authentication and user metadata storage across multiple systems. The VLD provides the ability to authenticate suppliers and access specific metadata elements about the supplier through the various ingestion connectors.
The following are key activities LAC will be addressing over the next year:
Special thanks to Pam Armstrong, Manager, Digital Repository Services and Standards Office, and Steve Sekerak, Enterprise Architect; LAC's TDR development and implementation would not be possible without their strong leadership.
[1] OAIS, Reference Model for an Open Archival Information System (OAIS). CCSDS 650.0-B-1, Blue Book, January 2002
http://public.ccsds.org/publications/archive/650x0b1.pdf
[2] METS (Metadata Encoding Transmission Schema)
www.loc.gov/standards/mets
[3] MODS (Metadata Object Description Schema)
www.loc.gov/standards/mods
[4] MARC (Machine Readable Cataloguing)
www.loc.gov/marc
[5] PREMIS (Preservation Metadata: Implementation Strategies),
www.loc.gov/standards/premis
[6] METS Simple Rights Schema
www.loc.gov/standards/mets/news080503.html
[7] Dublin Core
http://dublincore.org
[8] Government of Canada Records Management Metadata Standard,
www.collectionscanada.gc.ca/government/products-services/007002-5002.27-e.html
[9] OAI (Open Archives Initiative),
www.openarchives.org
[10] PureFTP,
www.pureftpd.org/project/pure-ftpd
[11] HTTP Apache, Tomcat,
www.apache.org
[12] PHP Hypertext Processor
www.php.net/
[13] SOAP (Simple Object Access Protocol),
www.w3.org/TR/soap/
[14] REST Architecture,
http://rest.blueoxen.net/cgi-bin/wiki.pl
[15] JHOVE, (JSTOR/Harvard Object Validation Environment)
http://hul.harvard.edu/jhove/
[16] DROID, (Digital Record Object Identification)
http://droid.sourceforge.net/wiki/index.php/Introduction
[17] PRONOM Online Registry
www.nationalarchives.gov.uk/pronom
[18] Heritrix, (Open Source Web Crawler)
http://crawler.archive.org/
[19] Wayback, (Open Source Web Archive Access)
http://archive-access.sourceforge.net/projects/wayback/
[20] LDAP Lightweight Directory Access Protocol)
http://ca.php.net/ldap
[21] International Standard ANSI/NISO Z39.50,
www.loc.gov/z3950/agency/
[22] International Standard ANSI/NISO Z39.88 OpenURL,
www.niso.org/kst/reports/standards?
step=2&gid=None&project_key=d5320409c5160be4697dc046613f71b9a773cd9e