Library and Archives Canada
Symbol of the Government of Canada

Institutional links

Digital Initiatives at LAC

Digital Research

Persistent Identification

Environmental Scan - February 2008

Today valuable intellectual and cultural resources increasingly reside on public networks like the Internet. The design of effective information management systems will encourage the use of these resources into the future. However, organizations must put in place infrastructure that affords long-term access. The implementation of a system for persistent identification of digital resources is one component of this infrastructure.

Based on an earlier report by the former National Library of Canada,1 this paper reviews existing persistent identification schemes and related services. It was developed following an environmental scan conducted by Library and Archives Canada (LAC) in late 2007. This paper aims to provide information to support the development of best practices in assigning identifier metadata to digital resources.

Following an overview of Internet addressing, the paper situates persistent identification within broader life cycle management practices. The body of the document describes existing identifier schemes and services. Resources appended to the paper include a glossary, profile of implementations, and selected bibliography.

Background

From the birth of the World Wide Web, the Uniform Resource Locator (URL) has been used to identify and reference network resources. However, without consistent maintenance URL-based addressing can lead to a labyrinth of dead ends.

URLs are not durable identifiers because they are location-based. Links are severed when resources are no longer maintained at referenced URLs. As digital objects change custody, file structures are reorganized, and domain names bought and sold relationships between objects and URLs are broken. Users often encounter messages of, '404 error: File not found'.

While broken links certainly introduce hostility to networks they also have quantifiable costs. Organizations that maintain digital objects regularly commit resources to verify links and implement processes to increase URL durability (e.g., consistent naming conventions). Ultimately these strategies are not viable long-term solutions to persistence. They are obviously resource intensive and cannot meet the needs of all stakeholders (e.g., the user's need for simplicity, the developer's need for sustainable code, the organization's need for marketability and branding). Nonetheless, it is likely that URLs will continue to be the primary Internet addressing mechanism in the short to medium-term.

While the need for persistent identification of digital resources is clear, broadly implementing formalized identification systems will require substantial resource investment. Perhaps more importantly, realizing persistence will require an ongoing commitment to maintain system infrastructure.

As knowledge work flourishes, Canadian businesses need dependable and timely access to digital resources to support key functions. At the same time, the Canadian public needs access to information that enhances their quality of life. Safeguarding both intellectual and cultural content is the responsibility of creators, publishers and custodians of digital resources. In so far as these resources represent the diverse knowledge and experiences of Canadians, persistent and open access must be ensured today and into the future.

Overview

In the early 1990s the functionality of hypertext systems was combined with existing Internet infrastructure to create the World Wide Web. Hyperlinks make dynamic connections between information resources residing on distributed networks. Web applications let users seamlessly navigate these pages of digital text, image and multimedia.

The Web relies on formal standards and technological specifications for information exchange between networked computers. The computing community, coordinated by organizations like the Internet Engineering Task Force (IETF) and World Wide Web Consortium (W3C), is responsible for the development and maintenance of this infrastructure. The entire concept of the Web rests on the assignment of globally unique identifiers to network resources.

Addressing on the Internet

Internet Protocol (IP)

The Internet is a global framework of interconnected computer networks. A collection of network protocols facilitates communication between computing endpoints, allowing data packets to be transmitted between them. The backbone of the Web, Internet Protocol (IP) is an addressing mechanism that supports data transmission.

Maintained by the Internet Assigned Numbers Authority (IANA), IP addresses identify devices on a network. Every entity connected to the Internet possesses a unique IP address expressed as a series of numbers separated by dots (e.g., 137.122.6.60). The IANA allocates blocks of IP addresses to bodies like registries and Internet service providers.

Today the protocol exists in two versions, IPv4 and its successor IPv6. Using 32-bit addressing, IPv4 provides just over 4 billion unique addresses. Developed to meet increasing demand for new network addresses, 128-bit IPv6 can provide more than 16 billion-billion addresses.2 IPv6 is not yet widely implemented. Interoperability between the versions will support the ongoing transition.

Domain Name System (DNS)

While computers use IP addresses to identify entities on the network, humans use domain names in their place. A domain is an intelligible, user-friendly label that identifies a particular IP address. The Domain Name System (DNS) translates these labels into IP addresses for manipulation by computers.

The DNS employs a form of indirect naming that requires the relationship between domain names and IP addresses to be continually maintained. Within this system, a hierarchy of servers manages authoritative information about domains and sub-domains. Web browsers and other Internet applications query DNS servers to resolve domain names to IP addresses.

The domain name,
www.collectionscanada.gc.ca,

resolves to the IP address,
142.78.40.177

While the assignment and resolution of IP addresses occurs transparently from end-users, the pervasiveness of the Web and Hypertext Transport Protocol (HTTP) has familiarized the public with the concept of network addressing. Users commonly understand that URL strings represent resource location.

URI, URL and URN

To standardize Internet architecture the computing community develops technical specifications in the form of IETF Request for Comment (RFC) documents. RFC 3968, a generic specification for the Uniform Resource Identifier (URI),3 defines a syntax for the identification and resolution of web resources. In earlier versions of this specification identifiers were typed. A URI was classed either as a URL,4URN (Uniform Resource Name),5 or URC (Uniform Resource Characteristic).6 As the concept of addressing developed, casting of discrete types was seen as unnecessarily complex.

Today the URI concept includes both the URL and URN schemes as subsets. A URI can be a locator, a name, or both. Locators identify resources by the path to their position on a particular host. Naming is a form of location-independent identification. It seeks to mitigate problems associated with tying resource identification to location.

While 'URI' is the officially sanctioned term, 'URL' continues to dominate popular usage. While the IETF recommends that future technical documentation use the generic term 'URI', many members of the computing community also continue to use the two interchangeably. Terminology issues notwithstanding, there is broad consensus on the value of implementing location-independent identifiers. Persistent identification is now understood as a requisite for long-term access to digital resources.

Persistent identification

Digital life cycle management

Many domains and communities of practice use the life cycle concept to organize stages through which an entity progresses over time. For example, the records management community uses a life cycle model to understand phases in the existence of information resources, from record creation through to disposition. While there is currently no formal model for the life cycle of digital objects, principles of records and information management have been recommended to this end.7

Like other types of information resources, digital objects must be effectively managed over time and across all phases of their existence. The stages of any specific digital life cycle will vary according to the community managing the resource. However, all digital life cycle models will include stages that reflect resource creation or acquisition, management and preservation, as well as use.8

Each of these stages is associated with key responsibilities for the digital resource. One of the strengths of the life cycle approach is its ability to highlight the range of stakeholders that participate in the management of resources over time.9 Factors such as a distributed infrastructure and successive transfers of custody increase the need to monitor activities across life cycle phases. Invoking this type of model underlines the importance of integrating stakeholders to ensure that access is protected in the long-term.

Work to develop Trusted Digital Repositories (TDR)10 acknowledges that interdependencies exist between creation, management and use,8 and between communities responsible for specific activities in these phases. Access depends on the development and implementation of strategies and infrastructure that encourage ongoing and coordinated resource management. To realize long-term access, persistent identification must be integrated in life cycle models.

Supporting persistence

Discussions of persistence often focus heavily on assigning unique identifiers. However, this perspective is narrow and can obscure the fact that persistence is the result of well-functioning systems. Persistence is best understood as the end result of a number of components operating in a systematic manner.

A system for persistent identification is composed of standards and technical specifications, resolution services, registry systems, naming authorities, etc. Schemes for assigning identifiers are only one part of the system. With identifiers in place, persistence is contingent on organizational commitment to appropriate policies and regular system administration. Technical standards must be paired with a firm commitment to maintain this infrastructure.

While systems may differ in their arrangement of components, all must incorporate three generic elements: an identification scheme, related resolution services, and a supporting infrastructure. A basic understanding of these key elements will allow the reader to compare the specific schemes and systems described in the body of this report.

Identification scheme
Systems of persistence are built on standardized schemes for resource identification. Schemes are defined by specifications that formalize their syntax (i.e., the arrangement of elements in the identifier string). Identification schemes also address technical issues including canonical form, permissible characters, prescriptive punctuation, case sensitivity and mandatory elements.

Resolution services
Resolution is the process of translating a resource identifier into information about that resource (i.e., a location, description, or the resource itself). Resolution is the bridge between resource identification and access. Resolution services use algorithms to locate information about identifiers within the context of a particular namespace. The majority of today's web browsers resolve only URLs. Support for location-independent resource naming will likely be incorporated in future applications. In the meantime, alternate resolution infrastructure must be put in place.

Maintaining persistence
Persistent identification systems hinge on coordinated management. Namespaces must be registered, and procedures for assigning identifiers must be implemented. More importantly, organizations must commit to maintaining associations between digital resources and related metadata. Without ongoing administration, persistent access will not be maintained. To be effective, a persistent identification system must incorporate some type of policy-related infrastructure.

Identifier schemes and services

Entities described in the body of this report include non-proprietary community standards, distributed resolution services, and commercial information management systems. Some represent only a single system component, for example the URN identifier scheme. Others, like The Handle System®, incorporate several components.

The decision to adopt a particular system of persistent identification is best made within the context of a broader organizational information management strategy. Components described in this report must be evaluated to ensure suitability within a given use case.

URN-based identifier schemes

Internet Engineering Task Force (IETF) www.ietf.org/

The development of the URN concept followed directly from problems associated with location-based identification. A URN is a URI with the properties of a name. A URN does not imply resource location: "The purpose or function of a URN is to provide a globally unique, persistent identifier used for recognition, for access to characteristics of the resource or for access to the resource itself."11 Resource names reference a single object over all stages of the life cycle.

The URN concept provides a common namespace for a number of different types of identifiers. Prior to defining the formal URN specification, the IETF released an information document that enumerated the minimum functional requirements for identifiers within the URN architecture (see Table 1). Any namespace operating as a URN-based implementation must meet these functional requirements.

Table 1
Functional requirements for resource identification systems11

Global scope
Identifiers are location-independent and have the same meaning everywhere

Global uniqueness
Identifiers are unique (i.e., one identifier does not reference information associated with multiple resources)

Persistence
Identifiers uniquely reference resources beyond their lifetime

Scalability
Identifiers are scalable and can be assigned to any resource

Legacy support
Identifiers can support legacy identification schemes to the extent that these satisfy minimum requirements

Extensibility
Identification schemes can accommodate future extensions

Independence
Responsible authorities maintain and assign resource identifiers within a given system

Resolution
Identifiers are supported by services that enable their translation

Technical specifications

The URN scheme dates from 1997 when the IETF formally defined its two-part canonical syntax.5 A URN is composed of the Namespace Identifier (NID) and the Namespace Specific String (NSS).

urn:<NID>:<NSS>

e.g., urn:ISSN:0259-000X

The NID element contains the identification code for the specific namespace. As an implementer of URN architecture, a namespace meets the functional requirements listed in Table 1. Each namespace is free to define syntax for the NSS element. The NSS is an identifier string that points to a resource within the context of a given namespace.

Resolution services

The URN framework distinguishes between naming schemes and resolution services. By separating the way names are resolved from the way they are constructed and assigned, any service can conceivably resolve a URN. However, this independence makes necessary the use of registry mechanisms to locate appropriate services. To-date, the Dynamic Delegation Discovery System (DDDS)12 has yet to be widely implemented.

The DDDS includes specifications for an algorithm, application, and database for resolution services. The application uses the DDDS algorithm to dynamically retrieve standardized Name Authority Pointer (NAPTR) resource records using the DNS as a distributed database. NAPTR records contain specific string transformation rules that the application applies to the URN for which authoritative resolution services are being sought.12 While the DDDS specification does not prescribe a protocol for communication between applications and resolver services, the Trivial HTTP resolution protocol (THTTP) is the only specification currently in existence.13 THTTP is a simple convention for encoding resolution service requests that can be easily implemented by HTTP servers. Table 2 describes the steps associated with URN resolution.

Table 2
Steps in URN resolution

  • The application parses the URN string and identifies the NID encoded namespace
  • A resolution request is made and an application queries the DNS for information related to the namespace
  • Query results are returned as one or more NAPTR records that point to available resolution services and appropriate protocols
  • The application retrieves information about the location of servers that provide resolution for specific protocols and domains
  • Resolution is complete when the authoritative server is located, and the URN is resolved to one or more URIs, resources, descriptions of resources, etc.

Maintenance

All URN namespaces are registered with the IANA using a specification template.14 These documents contain standard registration information as well as a declaration of syntactic structure and information on resolution. Namespace registrants are responsible for assigning unique and persistent identifiers within their domain.

Implementation

While a number of URN namespaces have been registered with the IANA, URN-based schemes have yet to be widely implemented because of a lack of consensus on a standard resolution system. The DDDS draft standard continues to be experimental. The Internet community will need to develop more robust resolution infrastructure before URN-based namespaces can be effectively implemented. Implementations of URN-based systems are described in Appendix B.

Persistent Uniform Resource Locator (PURL)

OCLC Online Computer Library Center (OCLC) www.purl.oclc.org/

Persistent Uniform Resource Locators (PURL) are an interim solution to persistence loosely based on the URN concept. The development of the PURL system assumes that the ideal solution to location-based addressing is the implementation of IETF URN architecture. However, it also recognizes that the pace of standards development is incapable of meeting current challenges in a timely manner.15 Developed by the OCLC, PURL services are compatible with many of the functional requirements of URNs. When URN-based systems are more broadly implemented, PURLs will be translated into this form.16

A URL identifies a digital resource by pointing to its network location. A PURL is a URL that points directly to an intermediate resolution service. This service links the PURL with a URL and returns the resource address to the client. The URL associated with a PURL represents the current physical location of the resource on a particular host.

Technical specifications

PURLs and URLs have a one-to-one relationship. Like URNs, PURLs are stable names for resources. Resource location may change over time but the assigned PURL remains constant. PURL services are responsible for maintaining up-to-date URLs for resources under their control.

A PURL consists of three components – a protocol, resolver address and user-assigned resource name.

<protocol><resolver address><name>

e.g., http://purl.oclc.org/OCLC/PURL/FAQ

The resolver address ensures global uniqueness of the PURL. The namespace may be hierarchically divided into a top level and various sub-domains. It is not necessary for the PURL name component to duplicate any portion of the string of its related URL. Both domain names and PURLs are persistent – neither can be deleted after creation.

Resolution service

A PURL resolver provides resolution services for all PURLs associated with its domain namespace. The PURL system uses HTTP to connect to resolution services, and standard HTTP redirects to return information to requesting clients. The DNS resolves the PURL resolver address component into a conventional IP address. The PURL service provider at that address subsequently resolves the resource name and returns to the client the location of the resource in the form of a URL.

Successful resolution requires that the association be maintained between a PURL and a valid URL. This relationship is not automatically updated. If a URL becomes outdated resolution will fail. Maintenance agents are responsible for updating information for their resources with the appropriate PURL server. The functioning of the PURL system is entirely dependent on the commitment to this service.

Maintenance

OCLC hosts a PURL server on which any user can create new sub-domains and maintain PURLs. Users must register with OCLC to access this server. However, server resolution software is freely available. Users can download software to establish their own PURL resolution services independent of the OCLC.17 PURL software includes functionality to create and maintain PURLs in a distributed environment. Maintaining current URLs is a shared responsibility. PURL utilities permit only authorized users to edit the database. The software is also capable of validating links between PURLs and URLs. Users have the option of receiving periodic validation reports which identify failed PURLs. This functionality assists in the maintenance of valid URLs.

Implementation

PURL resolvers, available at a variety of locations on the Internet, act as gateways for local digital collections. Individuals wanting to create and manage PURLs can become registered users at any PURL resolver site. The primary implementation of PURLs is the OCLC Bibliographic PURL service (see Appendix B). The OCLC does not publish a comprehensive list of registered users.

The Handle System®

Corporation for National Research Initiatives (CNRI) www.handle.net

Developed by the Corporation for National Research Initiatives (CNRI), the Handle System® is composed of a namespace and open protocol, and resolution and administrative services. A handle is a unique name assigned to a digital information resource. As name-based identifiers, handles are not dependent on resource location. The Handle System® is a global namespace and is capable of accommodating existing local namespaces.

While the Handle System® operates over the Internet, it does not rely on the DNS to provide resource naming services. Given the importance of the DNS for network routing, this system avoids overloading it with general resource naming functions.18 Instead a separate protocol, service model, and security features allow administrators to manage resource naming on the public network. As the system namespace is defined separately, the application can process handles without a URI prefix. However, it is also possible for registrants with URN namespaces to use handle technology to provide underlying naming and resolution services.

The system uses a hierarchical architecture. The CNRI maintains a Global Handle Registry (GHR) beneath which operate any number of Local Handle Services (LHS). A LHS is responsible for managing handles associated with multiple naming authorities, each of which has defined a local namespace. These name authorities oversee the creation and assignment of local resource identifiers.

Technical specifications

The Handle System® namespace is composed of two parts.19 The namespace prefix is a handle that identifies a particular naming authority, while the suffix is a local name assigned by that authority.

<Handle Naming Authority>/<Handle Local Name>

e.g., 145.76/jan2005-rk32498833

Naming authorities are globally unique within the Handle System®, and handle local names must be unique under their respective local namespace. Naming authorities may be arranged hierarchically, with child authorities existing beneath a parent namespace. Naming authority segments are separated by a period (".").

Resolution service

Resolution services are provided by a global system of servers that maintain information about handles and the repositories in which resources are stored. Client applications query the GHR to identify and locate the LHS associated with the handle's naming authority. The client can then communicate with the LHS where the handle is resolved. To improve resolution performance an LHS will often cache results of GHR queries. This way, subsequent queries on the same handle can be responded to locally, reducing traffic between client and server.

When accessing the system for resolution or administrative processes, client applications use Handle System® protocol.20 This protocol allows handle servers to authenticate clients as valid administrators. The system also accommodates proxy servers to process resolution requests using other protocols. The Handle System® HTTP Proxy Server enables handle resolution from standard web browsers. The proxy server queries the Handle System® for information on a handle, and responds to the client using HTTP.

Maintenance

Handle System® administrators can be assigned various levels of control. Authorized users can add and delete handles, and update their values. Administrative responsibilities can be defined for each handle, and performed from any location on the network. While the system includes services like authentication and data confidentiality, security is ultimately dependent on client and server practices. As an example, while LHS caching speeds processing it also introduces risks by eliminating authentication.

Implementation

Handle System® services and software are available through the CNRI. The CNRI also oversees the GHR management of naming authorities,21 and maintains the system of proxy servers that support requests in alternate protocols.22

HANDLE.NET software is made available through a public license agreement for download at no cost.23 However, to create or resolve handles users must enter into a service agreement with the CNRI. The cost to register as a Resolution Service Provider (RSP) is realized as a one-time fee of $50 US. This covers the assignment and registration of a new name authority in the GHR. An additional annual service fee of $50 US is also required. Annual fees can be prepaid at a discounted rate. The CNRI reserves the right to increase fees, as required, to support system functioning.

While a number of organizations use handles for persistent identification (see Appendix B), the major implementer of the Handle System® is the International Digital Object Identifier Foundation (IDF).

The DOI® System

International Digital Object Identifier Foundation (IDF) www.doi.org

The Digital Object Identifier (DOI) System is an infrastructure to manage digital content, developed as a collaborative project between the CNRI and the Association of American Publishers (AAP). Managed by the IDF, the DOI® System is a general framework for persistent identification of digital resources. Additional functionality allows implementers to customize services to meet specific business requirements. The DOI® System integrates resource description and resolution services with policies designed to support persistent access.24 The DOI® System is currently undergoing standardization through the International Organization for Standardization (ISO).25

The DOI® System consists of four components. A formal syntax and assignment rules for entity naming are standardized as ANSI/NISO Z39.84.26 Resolution services are provided via the Handle System®. The DOI® System also incorporates a data model and data dictionary for resource description. An overarching policy framework governs system operation.27

Technical specifications

DOIs are persistent and actionable identifiers with a two-part syntax. The prefix is composed of the Directory Code (DIR) and Registrant Code (REG). A forward slash ("/") marks the suffix which contains the DOI Suffix String (DSS).

DOI:<DIR>.<REG>/<DSS>

e.g., doi:10.1006/rwei.1999.0001

To distinguish the DOI® System from other implementations of the Handle System®, the DIR value remains constant at 10. The IDF assigns a REG to organizations registering DOI names. The suffix is a unique string assigned to resources by Registrants. The DSS can accommodate pre-existing identification schemes. It is the responsibility of Registrants to ensure the uniqueness of the DSS within each prefix.

Registration Agencies (RA) provide services on behalf of specific user communities. RAs are allocated blocks of DOI prefixes to assign to Registrants. They are also responsible for maintaining infrastructure that allows Registrants to declare and manage DOIs. All Registrants are required to maintain up-to-date information for DOIs within their prefix(es ). Persistence is dependent on the maintenance of current information on values to which DOIs can be resolved. The DOI Kernel Metadata Declaration (KMD) ensures interoperability of descriptive metadata. Table 3 summarizes elements that make up this metadata kernel.

Table 3
Descriptive metadata for information resources within the DOI® System
Metadata element Definition
DOI DOI name assigned to the resource
resourceIdentifier(s) Identifier within another system(s) of identification
resourceName(s) Common name of the resource
principalAgent(s)
agentRole(s)
Information about resource creation, publication, etc.
StructuralType Structure of the resource (i.e., physical fixation, digital fixation, performance, or abstract work)
mode(s) Mode of perception (i.e., audio, visual, audiovisual, or abstract)
ResourceType Description of the resource type (e.g., audio file, serial article, PDF, etc.)

Metadata is exchanged between RAs using Resource Metadata Declarations (RMD) specific to each type of resource (e.g., serial, eBook, sound recording, etc.). Both the KMD and RMDs are defined by XML (Extensible Markup Language) schemas.28

The <indecs> Data Dictionary (iDD) describes all valid metadata elements and their relationships.29 It also serves as a repository for mapping DOI metadata to other schemes. All metadata declarations are validated against values in the iDD.

Resolution service

The DOI® System uses Handle System® technology to provide resolution services.30 DOI names can be resolved to one or more values. In single-point resolution, a DOI is resolved to a resource location (e.g., a URL).24 In multiple-point resolution a DOI is resolved to one or more related entities.

To leverage metadata captured by the DOI® System, an application must provide services beyond basic resolution. To manage services, DOIs with shared characteristics are arranged into Application Profiles (AP). All DOIs belong to the Base AP which includes services to identify the assigning RA, KMD, and primary URL for the resource.

The CNRI has developed a freely available resolver plug-in31 to extend standard web browsers and support DOI resolution in native form (e.g., doi:10.123/456). However, the IDF also maintains a HTTP proxy server for requests using URL syntax.32 While direct resolution is preferred to the use of proxy servers, the CNRI and IDF recognize that this server is essential for the continued integrity of DOI names on the Internet.24

Maintenance

Effective functioning of the DOI® System depends on the success of underlying technologies (e.g., Handle System technology, iDD, etc.). Risks associated with this architecture are partly mitigated through the use of accepted standards. The system also conforms to both URI and URN specifications.

The DOI® System understands persistence as a function of organizations as opposed to purely computing technologies. Detailed IDF policies govern participation in the system by defining consistent practices that promote persistence.33

Implementation

To assign DOI names, resource custodians must remit a fee to the RA for their user community. RAs are free to set fees based on the nature of the services that they provide. While the IDF is a not-for-profit body, RAs may operate as for-profit or not-for-profit enterprises.

The IDF employs a cost-recovery model where all fees are leveraged towards system finance and development. Applicants for RA status must hold IDF membership, available for an annual fee of $35,000 US.24 RAs also pay a franchise fee per DOI name. As of 2006 this amounted to $0.04 US per name registered, with a minimum charge of $20,000 US annually.24 Maintenance fees are also charged by the IDF.

The DOI® System is heavily implemented in the electronic publishing industry, with CrossRef acting as the primary RA (see Appendix B). A key focus of this community is the provision of effective reference linking services. The OpenURL framework (see Appendix A) provides context-sensitive reference linking services that accommodate DOI identifiers.

Archival Resource Key (ARK)

California Digital Library (CDL) www.cdlib.org/inside/diglib/ark/

Design and development of the Archival Resource Key (ARK) was based on the findings of a global scan of persistent identification systems undertaken by John Kunze at the US National Library of Medicine (NLM). ARK identifiers are globally unique URLs that provide actionable links to three types of information – digital objects, descriptive metadata, and commitment statements from service providers.

Fundamental to ARK is the principle that persistence is actually a commitment by custodians of digital resources to provide various services. Persistence is not the result of any scheme or name assignment process, it "... is neither inherent in an object nor conferred on it by a particular naming syntax."34 While obviously resource intensive, effective functioning of identification systems rests on custodians maintaining a mechanism for name indirection. Persistence is the outcome when both digital objects and identifiers are successfully managed.

The goal of ARK is to leverage existing Web infrastructure to implement a simple process of name assignment. Focus is shifted away from identifier schemes and towards a commitment to maintain services that afford persistence.

Technical specifications

As of January 2008, the ARK specification has not been formally approved as an IETF standard.34 Within the syntax, identifiers begin with the ARK label (ark:), followed by the Name Assigning Authority Number (NAAN) and the Name with an optional Qualifier. Identifiers may be preceded by a URL in the form of a Name Mapping Authority Hostport (NMAH). The syntax is represented as follows.

[http://NMAH/]ark:/NAAN/Name[Qualifier]

e.g., http://ark.cdlib.org/ark:/13030/ft4w10060w

The NMAH component is a replaceable web address to which ARK service requests are sent. ARKs including a NMAH are actionable URLs. Separating the NMAH from the identifier proper ensures longevity of ARKs. For example, as Web infrastructure evolves it will be possible to append ARK identifier strings to a retrieval protocol other than HTTP.34

The NAAN is a 5 or 9 digit numeric code that represents the Name Assigning Authority (NAA) responsible for naming the resource. Like URNs, NAANs are registered with the IANA, and represent the top level namespace within the ARK scheme. The resource Name is an identifier string composed primarily of digits and non-vowel alphabetic characters with an optional check character. The ARK scheme can accommodate hierarchical components in digital resources. An optional Qualifier string can be used to create entry points into a hierarchy.

An ARK is an actionable link to different types of information – digital objects, metadata, and commitment statements. The HTTP URL Mapping Protocol (THUMP)35 allows clients to request information for an ARK by appending a query string to the identifier. Beginning with one or more question marks ("?"), queries prompt THUMP-enabled web servers to respond to client HTTP GET or HTTP POST requests (see Table 4).

Table 4
Queries on ARK identifiers
Request Example syntax Response
ARK only http://example.foo.com/object123 Client is redirected to the resource
ARK and single question mark ("?") http://example.foo.com/object123? Client receives resource metadata
ARK and double question mark ("??") http://example.foo.com/object123?? Client receives a commitment to permanence statement

Electronic Resource Citation (ERC) records are used to respond to queries for resource metadata and permanence statements.36 ERC records describe digital objects with minimal metadata. Four Kernel metadata elements represent a streamlined version of Dublin Core.37 Alternatively ERC metadata may be augmented by non-Kernel elements from other vocabularies.

The Kernel tells the 'story' of a resource – who, what, when and where (see Table 5).

Table 5
ERC kernel metadata elements

who: responsible person or party

what: human-readable identifier (name)

when: date related to the life cycle of the object

where: machine-readable identifier (location)

ERC records are arranged into segments, each containing one story. The type of information contained in a story is specified by segment labels (see Table 6).

Table 6
ERC segment labels

erc: describes the expression of the resource

erc-about: describes resource content

erc-support: describes the support commitment to the resource made by a provider

erc-meta: describes the provenance of the metadata record

Resolution service

ARK services are provided by a system developed in conjunction with the California Digital Library (CDL). The noid (nice opaque identifier)38 utility creates minters that generate, track, and bind persistent identifiers to digital resources. Effectively, this utility is a database for managing the Name component of the ARK syntax. noid uses templates to mint identifiers according to predefined specifications. It also provides resolution services for ARK identifiers.

Presented with an ARK for resolution, user applications parse the string to identify the optional NMAH. The presence of this component points the client to an active ARK service provider. The noid database operates behind this provider's web server. Server requests to activate a URL trigger the noid resolver to return database records that match the ARK.

In the absence of an NMAH component, the URL is not actionable. In this case client applications use the NAAN component to locate NMAHs who provide services for ARKs within that namespace. NMAH can be located using a simple name authority table available from the CDL for upload to local systems.39 It lists all assigned NAANs and the location of services for each. Alternately, clients can locate NMAHs using the DNS. Based on URN resolution, this method uses NAPTR resource records and a simplified algorithm to locate appropriate services.

Maintenance

Persistence depends on the effective maintenance of associations between identifiers and information resources. An NAA is free to assign identifiers according to its own policies, but these policies must be publicly declared. An NAA must also have a documented strategy for the management of namespaces.

Implementation

ARK developers agree that the ARK scheme may not be appropriate for all organizations.34 The commitment to maintain services that support identifier persistence requires considerable resource expenditure. While the CDL does not explicitly publish a list of ARK implementers, participating organizations and projects are listed in publicly available NAAN tables.39 Development of the ARK scheme is ongoing. Appendix B describes implementations at the CDL and Bibliothèque nationale de France (BnF).

Summary

This report is an overview of various schemes and systems for the persistent identification of digital resources. While the problems associated with location-based resource identification is clear, each system approaches the problem from a slightly different lens. The report describes a range of strategies extending from technical specifications for identifier syntax, to elaborate policy frameworks aimed at creating a shared commitment to digital resources.

An evaluation of identifier schemes and systems should follow from a careful review of existing organizational identification practices. While legacy identifier systems are rarely designed with persistence in mind, they frequently reflect organizational work processes and trends among established communities of practice. Interoperability and sharing of information resources is the goal of all developments in network infrastructure. When assessing systems for persistent identification, it is important that organizations consider available technology as well as resources related to ongoing system maintenance.

Appendix A: OpenURL framework

OCLC Online Computer Library Center (OCLC) www.oclc.org/research/projects/openurl/

The OpenURL framework provides open and context-sensitive reference linking of scholarly resources.40 The framework is essentially a protocol for transporting metadata about linked information resources. OpenURL was developed through research at Ghent University, Belgium, and was later acquired by Ex Libris Group, vendor of the SFX OpenURL resolver.41 OpenURL underwent fast track standardization and was endorsed in 2004 as ANSI/NISO Z39.88.42 In 2006, OCLC was named as maintenance agency.

Context-sensitivity allows OpenURL to treat the 'appropriate copy' issue.43 Increasingly, articles are distributed across a number of collections and fall under the control of a variety of custodians. When multiple copies of a resource exist, access is often governed by more than one policy. OpenURLs work to ensure that user requests for resources are referred to copies that conform to their particular privileges. OpenURLs carry bibliographic metadata about the resource being referenced, as well as information about the network context within which this reference occurs, and about the context within which the request for service is issued.

An OpenURL transports packages consisting of a resource identifier and associated metadata. Packages, called ContextObject Representations, are compound data structures that bind together metadata about the resource, the resolver, and the requester. The OpenURL syntax is defined by the generic specification for URIs.3 The ContextObject is defined by the Key/Encoded-Value ContextObject Format, a specification that uses HTTP GET or HTTP POST requests for data transmission and processing.

The OpenURL syntax includes two components separated by a question mark (?).The BASE-URL is a locator for a service processing OpenURLs, and the QUERY describes metadata transported by the package.

BASE-URL '?' QUERY

e.g., http://sfxserver.uni.edu/sfxmenu?id=oai:arXiv:physics/0003005

OpenURLs are processed by OpenURL applications. Resolution occurs when the processor at the BASE-URL acts on the specified QUERY component.

In addition to acting as maintenance agency for Open URL, OCLC also provides resolution-related services. A part of WorldCat Registry, the OpenURL Resolver Registry44 maintains information on available resolution services for OpenURLs. Strings received by the registry are redirected to local resolvers based on the IP address of the requesting entity. New implementations of the framework must apply to OCLC and download OpenURL 1.0 software.45

Appendix B: Profile of system implementations

URN-based identifier schemes

National Library of Finland

The bibliographic community is currently the most visible URN-implementer. National Library of Finland (www.nationallibrary.fi/) assigns URNs in the form of National Bibliographic Numbers (NBN) to its digital holdings that lack publisher-assigned identifiers (e.g., ISBNs).46 An NBN is used to identify resources not published through conventional means.

The objective of the Networked European Deposit Library (NEDLIB)47 project is to develop an infrastructure to support the capture of electronic publications, in line with the deposit guidelines of various national libraries. Within the context of this project, the National Library of Finland has developed a URN-generator to create NBNs for assignment by authors, publishers, etc., to digital resources acquired during web-harvesting. Resolution services have been developed and implemented internally, in conjunction with the University of Helsinki.

Organization for the Advancement of Structured Information Standards (OASIS)

The Organization for the Advancement of Structured Information Standards (OASIS) (www.oasis-open.org/home/index.php) is a consortium focused on the development and adoption of open standards for the information society. As a part of its operations, OASIS produces a variety of documentation, including specifications, working drafts, schemas, etc. A URN-based OASIS namespace is defined by RFC 3121.48 OASIS uses this namespace in the generation of persistent identifiers for its electronic resources.

Persistent Uniform Resource Locator (PURL)

OCLC Bibliographic PURL service

The Program for Cooperative Cataloging (PCC) is an international cooperative program coordinated by the Library of Congress (LC) and participants contributing to the Cooperative Online Serials (CONSER) and Monographic Bibliographic Record Programs (BIBCO). The OCLC Bibliographic PURL service (http://bibpurl.oclc.org/) enables participants of CONSER and BIBCO to cooperatively maintain PURLs located in the MARC 856 field (Electronic location and address) of bibliographic records. The service allows participants to share responsibility for maintaining current links in the national catalogue.

The OCLC-hosted PURL server is freely available. The implementation of the OCLC service followed a comprehensive pilot project undertaken by CONSER in 2001 and 2002.49 In conjunction with Zepheira, OCLC has announced plans to re-architect the service.50 OCLC PURL services will be updated to reflect changes associated with the Semantic Web.

National Library of Australia (NLA)

The National Library of Australia (NLA) and a number of other heritage organizations are collaborating on the development and maintenance of PANDORA, Australia's Web Archive (http://pandora.nla.gov.au/). The objective of PANDORA is to provide long-term access to significant Australian online publications.

The NLA currently manages the Australian PURL Resolver Service for publishers to assign persistent identifiers to digital publications and websites (http://purl.nla.gov.au/). Initially this service was used to support access to resources in PANDORA. Later the NLA created its own schema for assigning persistent URLs to resources in its collections. These identifiers are automatically assigned by PANDORA's integrated Digital Archiving System (PANDAS), developed in-house and implemented in 2001. While PANDORA has moved beyond PURL identifiers, the NLA continues to provide PURL resolution services for publishers on its website.

The Handle System®

Library of Congress National Digital Library Program (NDLP)

The National Digital Library Program (NDLP) is an LC American Memory initiative (http://memory.loc.gov/ammem/index.html) involving the conversion of historical materials to digital format for access online. The NDLP uses the Handle System® to provide globally unique identifiers to digital information resources.51 As a registrant with the Handle System®, LC maintains local handle services that manage associations between persistent identifiers and information about digital resources. The LC server resolves handles on behalf of requesting clients. The NDLP has established an internal policy for the assignment of handles, as well as guidelines for the use of handles with referenced resources external to LC.

Defense Virtual Information Architecture (DVIA)

The Defense Technical Information Center (DTIC), Defense Advanced Research Projects Agency (DARPA) and CNRI are collaborating on the Defense Virtual Information Architecture (DVIA) (www.cnri.reston.va.us/dtic.html). The objective of the project is to continue development of the Digital Object Architecture, designed in conjunction with the Handle System®, while creating a digital library of DTIC data.

The project will see the CRNI develop a distributed repository for digital objects that uses handles to locate objects within the service. The Handle System® will provide resolution services for resources on the network. The project will also incorporate an existing system for information retrieval, as well as a user interface, to demonstrate how CNRI components might be used in an actual implementation. The registry uses OpenURL technology to provide context-sensitive linking to information resources. Further research and development continues for the DVAI project.

Advanced Distributed Learning (ADL) Initiative

The Advanced Distributed Learning (ADL) Initiative (www.adlnet.gov/index.aspx) develops and implements learning technologies across the US Department of Defense (DoD). ADL works to develop standards, tools and learning content for use in a distributed information environment. The Sharable Content Object Reference Model (SCORM) is an existing technical specification for the development and deployment of digital content objects across a network. This initiative involves the Content Object Repository Discovery and Resolution Architecture (CORDRA), an infrastructure for discovery, identification, resolution and delivery of these objects.

The Handle System® technology was selected by the ADL for generating and resolving persistent identifiers assigned to sharable content objects. The CNRI is also involved in the design of the ADL Registry (ADL-R). ADL-R will facilitate discovery and reuse of content by acting as a local registry of digital objects available across the DoD.

The DOI® System

CrossRef

CrossRef (www.crossref.org/index.html) allows users of scholarly publications to link from references to cited information resources. Membership in CrossRef is open to any scholarly publisher. CrossRef operates on a cross-publisher basis where publishers agree to link both to, and from content held by other participants. Reciprocal linking is the key to moving between information resources held by different publishers.

CrossRef uses the DOI® System to uniquely identify digital resources. CrossRef is an RA within the context of the DOI® System,52 and as such is charged with providing infrastructure for publishers to declare and maintain metadata for their information resources. By depositing DOIs and descriptive metadata with the CrossRef system, publishers allow 'inbound' linking to resources in their collections. 'Outbound' links are processed by querying the CrossRef database to identify DOIs. CrossRef allows individual publishers to maintain control over full-text access to their resources. To integrate with conventional library systems, publishers using CrossRef are asked to implement OpenURLs.

National Research Council (NRC) Research Press53

A part of the National Research Council Canada (NRC), the NRC Research Press (http://pubs.nrc-cnrc.gc.ca/) publishes a number of journals, monographs, and conference proceedings. It is also mandated to support the scientific community by providing services to scientific publishers. As a scholarly publisher the NRC is a CrossRef member and uses DOIs to identify its scientific resources. NRC has a DOI prefix assigned by CrossRef. By depositing DOIs and metadata for its resources, NRC enables inbound linking to its content. Outbound links from NRC references are made by querying CrossRef to obtain DOIs for resources managed by other publishers.

Archival Resource Key (ARK)

California Digital Library (CDL)

Coming out of the University of California, the objective of the California Digital Library (CDL) (www.cdlib.org/) is to provide leadership in the innovative use of technology for the development of digital library collections and services.

While the CDL is involved in the development of ARK, it is also the primary system implementer. As of 2003, the CDL had assigned 80,000 ARKs to resources within its own collections.54 A strong commitment to maintain services is critical to persistence in the ARK system. For this reason, the CDL assigns PURLs to resources external to its control.

The NAAN for the CDL is 13030.55 CDL-generated ARKs use terminal check characters, and are limited to digits and non-vowel alphabetic characters. This strategy aims to mitigate risks associated with single character and transposition errors.54 To generate ARKs and bind associated metadata, the CDL uses the open-source noid application. While the CDL continues to honour its commitment to the persistence of ARKs in its care, the organization's formal commitment statement is not completely stable. The document will undergo further development and review to ensure that the CDL continues to meet the requirements for persistence as they evolve over time.

Bibliothèque nationale de France (BnF)

The Bibliothèque nationale de France (BnF) (www.bnf.fr/) has implemented a system based on ARK for the persistent identification of digital resources in its collections. The primary consideration in the decision to adopt ARK was related to aligning resource identification with long-term preservation in the context of an OAIS-based digital repository.56

As a national library BnF collections are composed of a diverse range of information resources – from serials and books, to born-digital content. The BnF uses ARK identifiers for objects with a variety of existing identification schemes (e.g., ISSN, ISBN, DOIs, etc.). In part, ARK was chosen because it can accommodate existing identification schemes associated with the BnF's various collection development and digitization work processes. ARKs also allow the BnF to reference resources with hierarchical parts (e.g., individual pages in a digitized book). BnF hopes that within the context of a single digital repository, the implementation of ARK identifiers will simplify the management of resources over time.

Appendix C: Glossary

ACTIONABLE
A property that refers to the ability of a given application (e.g., a web browser), to translate an identifier string into information that can be used to provide access to the resource


CLIENT-SERVER ARCHITECTURE
Computer architecture characterized by a separation between clients and servers and implemented over a network. Client applications (e.g., web browsers) formulate requests to networked servers (e.g., web servers). After processing, servers return requested information to the client application


DIGITAL OBJECT
An entity within the scope of a digital information system (i.e., an information resource in a digital form)


DISTRIBUTED NETWORK
A form of computing architecture characterized by a number of independent and geographically dispersed computers, connected by a network (e.g., the Internet). These computers work together to complete processing associated with a given problem


DISTRIBUTED SERVICES
A method of computer processing where different parts of a single program run simultaneously on two or more computers, communicating using an accepted protocol and operating in a networked environment


DOMAIN NAME SYSTEM (DNS)
A system that associates memorable, alphanumeric names to a domain's IP (Internet Protocol) address. The DNS uses a resolver to link an IP address to a URL


HANDLE
A permanent identifier assigned to a digital resource that is not related to its location on a particular host


IDENTIFIER
An association between a string of character data and a digital resource. This abstraction is manifest by a record of the association


MINTER
A system that can be used to generate unique persistent identifiers for information resources


NAME INDIRECTION
The technique of maintaining key-value tables that allow referencing of an object using a name instead of the value itself


NAMESPACE
A set composed of unique identifiers, or names. An identifier defined in a given namespace is associated with that namespace. Thus, the meaning of an identifier is determined by its namespace. A namespace is specified by a formal syntax and semantics


NAMING AUTHORITY
A body responsible for managing the assignment of components associated with namespaces


PROXY SERVER
A server that receives client requests and forwards these to other servers. Proxy servers make requests to other servers on behalf of the client


RESOLUTION
The process of submitting a request to a network service, containing the identifier of an information resource, and receiving as a result, one or more pieces of information related to the resource


RESOLVER
A database capable of providing information about a resource identified by some scheme. Information may include resource location, description, or the resource itself


RESOURCE
A general term referring to any digital object (e.g., an electronic document, an image, a webpage, etc.)


SCALABILITY
A system property that refers to its capacity to process increasing volumes of work or be readily enlarged


STRING
An ordered sequence of characters from a predetermined set


SYNTAX
A set of rules that specifies a valid character string within the context of a particular namespace


SYSTEM
A set of entities that combine to operate as a whole

Appendix D: List of acronyms

The following table contains acronyms that appear in the body of this report. Acronyms are ordered alphabetically and are not affiliated with their particular scheme or system.

ACRONYM NAME
AAP Association of American Publishers
ANSI American National Standards Institute
AP Application Profile
ARIN American Registry for Internet Numbers
ARK Archival Resource Key
CDL California Digital Library
CNRI Corporation for National Research Initiatives
DCMI Dublin Core Metadata Initiative
DDDS Dynamic Delegation Discovery System
DIR Directory Code
DNS Domain Name System
DOI Digital Object Identifier or DOI System
DSS DOI Suffix String
ERC Electronic Resource Citation
GHR Global Handle Registry
HTTP Hypertext Transport Protocol
IANA Internet Assigned Numbers Authority
iDD <indecs> Data Dictionary
IDF International Digital Object Identifier Foundation
IETF Internet Engineering Task Force
IP Internet Protocol
IPv4 / v6 Internet Protocol Version 4 / Version 6
ISO International Organization for Standardization
KMD Kernel Metadata Declaration
LC Library of Congress
LHS Local Handle Services
NAAN Name Assigning Authority Number
NAPTR Name Authority Pointer resource records
NBN National Bibliographic Numbers
NID Namespace Identifier
NMAH Name Mapping Authority Hostport
noid nice opaque identifier
NSS Namespace Specific String
OAIS Open Archival Information System
OCLC Online Computer Library Center
PURL Persistent Uniform Resource Locator
RA Registration Agencies
REG Registrant Code
RFC Request for Comment
RLG Research Libraries Group
RMD Resource Metadata Declarations
RSP Resolution Service Provider
TDR Trusted Digital Repository
THTTP Trivial HTTP resolution protocol
THUMP HTTP URL Mapping Protocol
URC Uniform Resource Characteristic
URI Uniform Resource Identifier
URL Uniform Resource Locator
URN Uniform Resource Name
W3C World Wide Web Consortium
XML Extensible Markup Language

Appendix E: Selected bibliography

General resources

Beagrie, N., and D. Greenstein. "A strategic framework for creating and preserving digital collections." Arts and Humanities Data Service, July 2001. http://ahds.ac.uk/strategic.pdf (accessed January 3, 2008).

Causton, L. "Identifying and describing web resources." European Commission DGXIII/E-4. www.iua.upf.es/~jblat/material/doctorat/web_resources.pdf (accessed July 16, 2007).

Electronic Resource Preservation and Access Network (erpanet). "Final report: Persistent Identifiers." erpaSeminar. Cork, Ireland. June 17-18, 2004. www.erpanet.org/events/2004/cork/Cork%20Report.pdf (accessed August 9, 2007).

Gilliand-Swetland, A.J. "Enduring paradigm, new opportunities: The value of the archival perspective in the digital environment." Washington, DC: Council on Library and Information Resources, February 2000. http://clir.org/pubs/reports/pub89/pub89.pdf (accessed January 22, 2008).

Hilse, H-W., and J. Kothe. Implementing persistent identifiers: Overview of concepts, guidelines and recommendations. London: Consortium of European Research Libraries, 2006. http://bibpurl.oclc.org/web/16923 (accessed July 26, 2007).

Online Computer Library Center (OCLC). "Trustworthy repositories audit and certification: Criteria and checklist." February 2007. www.crl.edu/PDF/trac.pdf (accessed January 22, 3008).

Research Libraries Group. "Trusted Digital Repositories: Attributes and responsibilities." May 2002. www.oclc.org/programs/ourwork/past/trustedrep/repositories.pdf (accessed January 22, 2008).

Watson, J. "The LIFE Project research review: Mapping the landscape, riding a life cycle." Lifecycle Information for E-Literature (LIFE), November 2005. http://eprints.ucl.ac.uk/1856/1/review.pdf (accessed February 3, 2008).

Resources on identifier schemes and related services

Uniform Resource Names (URN)

Daniel, R. "A trivial convention for using HTTP in URN resolution." RFC 2169. Internet Engineering Task Force, June 1997. http://tools.ietf.org/html/rfc2169 (accessed July 26, 2007).

Hoffman, P.E., and R. Daniel, Jr. "URN resolution overview." Internet-Draft. Expires October 21, 1995. International Federation of Library Associations. www.ifla.org.sg/documents/libraries/cataloging/metadata/urn3.txt (accessed July 26, 2007).

Mealling, M. "Dynamic Delegation Discovery System (DDDS) Part one: The comprehensive DDDS." RFC 3401. Internet Engineering Task Force, October 2001. http://tools.ietf.org/html/rfc3401 (accessed July 26, 2007).

----------. "Dynamic Delegation Discovery System (DDDS) Part two: The algorithm." RFC 3402. Internet Engineering Task Force, October 2002. http://tools.ietf.org/html/rfc3402 (accessed July 26, 2007).

----------. "Dynamic Delegation Discovery System (DDDS) Part three: The Domain Name System (DNS) database." RFC 3403. Internet Engineering Task Force, October 2002. http://tools.ietf.org/html/rfc3403 (accessed July 26, 2007).

----------. "Dynamic Delegation Discovery System (DDDS) Part four: The Uniform Resource Identifiers (URI) resolution application." RFC 3404. Internet Engineering Task Force, October 2002. http://tools.ietf.org/html/rfc3404 (accessed July 26, 2007).

----------. "Dynamic Delegation Discovery System (DDDS) Part five: URI.ARPA assignment procedures." RFC 3405. Internet Engineering Task Force, October 2002. http://tools.ietf.org/html/rfc3405 (accessed July 26, 2007).

Moats, R. "URN syntax." RFC 2141. Internet Engineering Task Force, May 1997. http://tools.ietf.org/html/rfc2141 (accessed July 26, 2007).

Sollins, K., and L. Masinter. "Functional requirements for Uniform Resource Names." RFC 1737. Internet Engineering Task Force, December 1994. http://tools.ietf.org/html/rfc1737 (accessed July 26, 2007).

Persistent Uniform Resource Locators (PURL)

Shafer, K., S. Weibel, E. Jul, and J. Fausey. "Introduction to Persistent Uniform Resource Locators." http://purl.oclc.org/docs/inet96.html (accessed July 26, 2007).

Weibel, S., E. Jul, and K. Shafer. "PURLs: Persistent Uniform Resource Locators." http://purl.oclc.org/docs/new_purl_summary.html (accessed January 22, 2008).

The Handle System®

Corporation for National Research Initiatives (CNRI). "HANDLE.NET services: Global Handle Registry®." Handle System. www.handle.net/introduction.html (accessed July 27, 2007).

---------. "Proxy server system." Handle System. www.handle.net/proxy.html (accessed July 27, 2007).

Sun, S., L. Lannom, and B. Boesch. "Handle System overview." RFC 3650. Internet Engineering Task Force, November 2003. http://tools.ietf.org/html/rfc3650 (accessed July 27, 2007).

----------. "Handle System namespace and service definition." RFC 3651. Internet Engineering Task Force, November 2003. http://tools.ietf.org/html/rfc3651 (accessed July 27, 2007).

Sun, S., S. Reilly, L. Lannom, and J. Petrone. "Handle System protocol (ver 2.1) specification." RFC 3652. Internet Engineering Task Force, November 2003. http://tools.ietf.org/html/rfc3652 (accessed July 27, 2007).

The DOI® System

International DOI Foundation (IDF). "DOI® handbook". Version 4.4.1. October 5, 2006. www.doi.org/hb.html (accessed July 30, 2007).

--------. "The DOI System: Introductory overview." www.doi.org/overview/sys_overview_021601.html (accessed July 30, 2007).

National Information Standards Organization (NISO). "ANSI/NISO Z39.84-2005: Syntax for the Digital Object Identifier." September 30, 2005. www.niso.org/standards/resources/Z39-84-2005.pdf (accessed July 30, 2007).

Archival Resource Key (ARK)

Dublin Core Metadata Initiative (DCMI) Metadata Community. "The DC kernel application profile." Draft 1. DCMI, August 7, 2007. http://dublincore.org/kernelwiki/KernelApplicationProfileDraft (accessed August 16, 2007).

Gamiel, K., J. Kunze and N. Nassar. "THUMP – The HTTP URL Mapping Protocol." Internet-Draft. February 24, 2007. http://tools.ietf.org/html/draft-kunze-thump-02 (accessed January 28, 2008).

Kunze, J.A. "A metadata kernel for electronic permanence." October 5, 2001. http://jodi.tamu.edu/Articles/v02/i02/Kunze/kunze-final.pdf (accessed August 2, 2007).

-----------. "Towards Electronic Persistence Using ARK Identifiers." July 2003. www.cdlib.org/inside/diglib/ark/arkcdl.pdf (accessed August 2, 2007).

Kunze, J.A., and R.P.C. Rodgers. "The ARK persistent identifier scheme." CDL-Draft. July 24, 2007. www.cdlib.org/inside/diglib/ark/arkspec.pdf (accessed August 16, 2007).

Kunze, J.A., and M.A. Russell. "[noid (Nice Opaque Identifier) minting and binding tool]". April 19, 2006. www.cdlib.org/inside/diglib/ark/noid.pdf (accessed August 2, 2007).

Kunze, J., and A. Turner. "Kernel metadata and Electronic Resource Citations (ERCs)," Draft. Dublin Core Metadata Initiative (DCMI) Kernel Metadata Task Group, August 11, 2007. http://dot.ucop.edu/home/jak/erc.html (accessed August 16, 2007).

OpenURL

Apps, A., and R. MacIntyre. "Why OpenURL?" D-Lib (May 2006) www.dlib.org/dlib/may06/apps/05apps.html (accessed June 7, 2007).

National Information Standards Organization (NISO). "ANSI/NISO Z39.88-2004. The OpenURL framework for context-sensitive services." April 15, 2007. www.niso.org/standards/resources/Z39_88_2004.pdf (accessed August 1, 2007).

Van de Sompel, H., and O. Beit-Arie. "Open linking in the scholarly information environment using the OpenURL framework." D-Lib (March 2001) www.dlib.org/dlib/march01/vandesompel/03vandesompel.html (accessed January 28, 2008).

Resources for Internet standards and technical development community

American National Standards Institute (ANSI)
www.ansi.org/

International Organization for Standardization (ISO)
www.iso.org/iso/en/ISOOnline.frontpage

Internet Assigned Numbers Authority (IANA)
www.iana.org/

Internet Corporation for Assigned Names and Numbers (ICANN)
www.icann.org

Internet Engineering Task Force (IETF)
www.ietf.org/home.html

National Information Standards Organization (NISO)
www.niso.org

National Library of Australia (NLA). Preserving Access to Digital Information (PADI)
www.nla.gov.au/padi/

World Wide Web Consortium (W3C)
www.w3.org

Request for Comment (RFC) documents

The IETF Request for Comment index is available at http://tools.ietf.org/rfc/

RFC no. Citation
RFC 1737 Sollins, K., and L. Masinter. "Functional requirements for Uniform Resource Names." RFC 1737. Internet Engineering Task Force, December 1994. http://tools.ietf.org/html/rfc1737 (accessed July 26, 2007).
RFC 1738 Berners-Lee, T., L. Masinter, and M. McCahill. "Uniform Resource Locators (URL)." RFC 1738. Internet Engineering Task Force, December 1994. http://tools.ietf.org/html/rfc1738 (accessed July 26, 2007).
RFC 2141 Moats, R. "URN syntax." RFC 2141. Internet Engineering Task Force, May 1997. http://tools.ietf.org/html/rfc2141 (accessed July 26, 2007).
RFC 2169 Daniel, R. "A trivial convention for using HTTP in URN resolution." RFC 2169. Internet Engineering Task Force, June 1997. http://tools.ietf.org/html/rfc2169 (accessed July 26, 2007).
RFC 3121 Best, K. "A URN namespace for OASIS." RFC 3121. Internet Engineering Task Force, June 2001. http://tools.ietf.org/html/rfc3121 (accessed July 26, 2007).
RFC 3188 Hakala, J. "Using National Bibliography Numbers as Uniform Resource Names." RFC 3188. Internet Engineering Task Force, October 2001. http://tools.ietf.org/html/rfc3188 (accessed July 26, 2007).
RFC 3305 Mealling, M., and R. Denenberg. "Report from the Joint W3C/IETF URI Planning Interest Group: Uniform Resource Identifiers (URIs), URLs, and Uniform Resource Names (URNs): Clarifications and recommendations." RFC 3305. Internet Engineering Task Force, August 2002. http://tools.ietf.org/html/rfc3305 (accessed July 26, 2007).
RFC 3401 Mealling, M. "Dynamic Delegation Discovery System (DDDS) Part one: The comprehensive DDDS." RFC 3401. Internet Engineering Task Force, October 2002. http://tools.ietf.org/html/rfc3401 (accessed July 26, 2007).
RFC 3402 Mealling, M. "Dynamic Delegation Discovery System (DDDS) Part two: The algorithm." RFC 3402. Internet Engineering Task Force, October 2002. http://tools.ietf.org/html/rfc3402 (accessed July 26, 2007).
RFC 3403 Mealling, M. "Dynamic Delegation Discovery System (DDDS) Part three: The Domain Name System (DNS) database." RFC 3403. Internet Engineering Task Force, October 2002. http://tools.ietf.org/html/rfc3403 (accessed July 26, 2007).
RFC 3404 Mealling, M. "Dynamic Delegation Discovery System (DDDS) Part four: The Uniform Resource Identifiers (URI) resolution application." RFC 3404. Internet Engineering Task Force, October 2002. http://tools.ietf.org/html/rfc3404 (accessed July 26, 2007).
RFC 3405 Mealling, M. "Dynamic Delegation Discovery System (DDDS) Part five: URI.ARPA assignment procedures." RFC 3405. Internet Engineering Task Force, October 2002. http://tools.ietf.org/html/rfc3405 (accessed July 26, 2007).
RFC 3650 Sun, S., L. Lannom, and B. Boesch. "Handle System overview." RFC 3650. Internet Engineering Task Force, November 2003. http://tools.ietf.org/html/rfc3650 (accessed July 27, 2007).
RFC 3651 Sun, S., L. Lannom, and B. Boesch. "Handle System namespace and service definition." RFC 3651. Internet Engineering Task Force, November 2003. http://tools.ietf.org/html/rfc3651 (accessed July 27, 2007).
RFC 3652 Sun, S., S. Reilly, L. Lannom, and J. Petrone. "Handle System protocol (ver 2.1) specification." RFC 3652. Internet Engineering Task Force, November 2003. http://tools.ietf.org/html/rfc3652 (accessed July 27, 2007).
RFC 3986 Berners-Lee, T., R. Fielding, and L. Masinter. "Uniform Resource Identifier (URI): Generic syntax." RFC 3986. Internet Engineering Task Force, January 2005. http://tools.ietf.org/html/rfc3986 (accessed July 26, 2007).
RFC 5013 Kunze, J., and T. Baker. "The Dublin Core metadata element set." RFC 5013. Internet Engineering Task Force, August 2007. http://tools.ietf.org/html/rfc5013 (accessed February 4, 2008).

1 M. Dickison, "Persistent locators for federal government publications: Summary of a study conducted for the Depository Services Program and the National Library of Canada," Ottawa: National Library of Canada, 2002 www.collectionscanada.ca/obj/r4/f2/r4-500.1-e.pdf (accessed January 22, 2008).

2 American Registry for Internet Numbers (ARIN), "IPv4 and IPv6," ARIN, www.arin.net/about_us/media/fact_sheets/IPv4_IPv6.pdf (accessed January 22, 2008).

3 T. Berners-Lee, R. Fielding, and L. Masinter, "Uniform Resource Identifier (URI): Generic syntax," RFC 3986, Internet Engineering Task Force (IETF), January 2005 http://tools.ietf.org/html/rfc3986 (accessed January 22, 2008).

4 T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform Resource Locators (URL)," RFC 1738, IETF, December 1994 http://tools.ietf.org/html/rfc1738 (accessed January 22, 2008).

5 R. Moats, "URN syntax," RFC 2141, IETF, May 1997 http://tools.ietf.org/html/rfc2141 (accessed January 28, 2008).

6 For a detailed discussion of classes, see M. Mealling, and R. Denenberg, "Report from the Joint W3C/IETF URI Planning Interest Group: Uniform Resource Identifiers (URIs), URLs, and Uniform Resource Names (URNs): Clarifications and recommendations," RFC 3305, IETF, August 2002 http://tools.ietf.org/html/rfc3305 (accessed January 22, 2008).

7 J. Watson, "The LIFE Project research review: Mapping the landscape, riding a life cycle," Lifecycle Information for E-Literature (LIFE), November 2005 http://eprints.ucl.ac.uk/1856/1/review.pdf (accessed February 3, 2008).

8 N. Beagrie, and D. Greenstein, "A strategic framework for creating and preserving digital collections," Arts and Humanities Data Service, July 2001 http://ahds.ac.uk/strategic.pdf (accessed January 22, 2008).

9 A.J. Gilliland-Swetland, "Enduring paradigm, new opportunities: The value of the archival perspective in the digital environment," Washington: Council on Library and Information Resources, February 2000 http://clir.org/pubs/reports/pub89/pub89.pdf (accessed January 22, 2008).

10 For information on TDRs, see Research Libraries Group (RLG), "Trusted Digital Repositories: Attributes and responsibilities," May 2002 www.oclc.org/programs/ourwork/past/trustedrep/repositories.pdf (accessed January 22, 2008); and Online Computer Library Center (OCLC), "Trustworthy repositories audit and certification: Criteria and checklist," February 2007 www.crl.edu/PDF/trac.pdf (accessed January 22, 2008). The TDR concept is based on the reference model for an Open Archival Information System (OASI). See International Organization for Standardization (ISO), "ISO 14721: 2003: Space data and information transfer systems - Open archival information system - Reference model," Geneva, Switzerland: ISO, 2003.

11 K. Sollins, and L. Masinter, "Functional requirements for Uniform Resource Names," RFC 1737 IETF, December 1994 http://tools.ietf.org/html/rfc1737 (accessed January 22, 2008).

12 M. Mealling, "Dynamic Delegation Discovery System (DDDS) Part one: The comprehensive DDDS," RFC 3401 Internet Engineering Task Force, October 2002 http://tools.ietf.org/html/rfc3401; "Dynamic Delegation Discovery System (DDDS) Part two: The algorithm," RFC 3402 IETF, October 2002 http://tools.ietf.org/html/rfc3402; "Dynamic Delegation Discovery System (DDDS) Part three: The Domain Name System (DNS) database," RFC 3403 IETF, October 2002 http://tools.ietf.org/html/rfc3403; "Delegation Discovery System (DDDS) Part four: The Uniform Resource Identifiers (URI) resolution application," RFC 3404, IETF, October 2002 http://tools.ietf.org/html/rfc3404; and "Dynamic Delegation Discovery System (DDDS) Part five: URI.ARPA assignment procedures," RFC 3405, IETF, October 2002 http://tools.ietf.org/html/rfc3405 (accessed January 22, 2008).

13 R. Daniel, "A trivial convention for using HTTP in URN resolution," RFC 2169, IETF, June 1997 http://tools.ietf.org/html/rfc2169 (accessed January 28, 2008).

14 The Internet Assigned Numbers Authority (IANA) registry is available at http://iana.org/assignments/urn-namespaces (accessed January 22, 2008).

15 S. Weibel, E. Jul, and, K. Shafer, "PURLs: Persistent Uniform Resource Locators," http://purl.oclc.org/docs/new_purl_summary.html (accessed January 22, 2008).

16 K. Shafer, S. Weibel, E. Jul, and, J. Fausey, "Introduction to Persistent Uniform Resource Locators," http://purl.oclc.org/docs/inet96.html (accessed January 22, 2008).

17 OCLC PURL Resolver software is available at www.oclc.org/research/projects/purl/download.htm (accessed January 22, 2008).

18 S. Sun, L. Lannom and, B. Boesch, "Handle System overview," RFC 3650 IETF, November 2003 http://tools.ietf.org/html/rfc3650 (accessed January 22, 2008).

19 S. Sun, S. Reilly, and, L. Lannom, "Handle System namespace and service definition," RFC 3651, IETF, November 2003 http://tools.ietf.org/html/rfc3651 (accessed January 22, 2008).

20 S. Sun, S. Reilly, L. Lannom, and, J. Petrone, "Handle System protocol (ver 2.1) specification," RFC 3652, IETF, November 2003 http://tools.ietf.org/html/rfc3652 (accessed January 22, 2008).

21 Corporation for National Research Initiatives (CNRI), "HANDLE.NET Services: Global Handle Registry®," Handle System, www.handle.net/introduction.html (accessed January 22, 2008).

22 CNRI, "Proxy server system," Handle System, www.handle.net/proxy.html (accessed January 22, 2008).

23 HANDLE.NET software is available at www.handle.net/download.html (accessed January 22, 2008).

24 International DOI Foundation (IDF), DOI® handbook, Version 4.4.1, October 5, 2006 www.doi.org/hb.html (accessed January 22, 2008).

25 The document is released for comment as "Committee Draft ISO/CD 26324, Information and documentation - Digital object identifier (DOI)," available at www.doi.org/ISO_Standard/sc9n475.pdf (accessed February 3, 2008).

26 National Information Standards Organization (NISO), "ANSI/NISO Z39.84-2005, Syntax for the Digital Object Identifier," September 30, 2005 www.niso.org/standards/resources/Z39-84-2005.pdf (accessed January 22, 2008).

27 For a description of the relationship between system components, see IDF, "Value added by the DOI system," Version 2 www.doi.org/factsheets/0607ValueAdded.pdf (accessed January 22, 2008).

28 See "Appendix 5 DOI Resource Metadata Declaration," and "Appendix 6 DOI Kernel Metadata Declaration: XML schema," in IDF, DOI® handbook, Version 4.4.1, October 5, 2006 www.doi.org/hb.html (accessed January 22, 2008).

29 See "Appendix 4 indecs Data Dictionary," in IDF, DOI® handbook, Version 4.4.1, October 5, 2006 www.doi.org/hb.html (accessed January 22, 2008).

30 For a description of the relationship between the systems, see IDF, "DOI® System and the Handle System®," Version 4.1 www.doi.org/factsheets/0607DOIHandle4-1.pdf (accessed January 22, 2008).

31 The plug-in is available at www.handle.net/other_software.html (accessed January 22, 2008).

32 The proxy server is available at http://dx.doi.org/ (accessed January 22, 2008).

33 For a description of system policies, see "Chapter 6 Policy," in IDF, DOI® handbook, Version 4.4.1, October 5, 2006 www.doi.org/hb.html (accessed January 22, 2008).

34 J.A. Kunze and, R.P.C. Rodgers, "The ARK persistent identifier scheme," CDL-Draft, July 24, 2007 www.cdlib.org/inside/diglib/ark/arkspec.pdf (accessed January 22, 2008).

35 K. Gamiel, J. Kunze and, N. Nassar, "THUMP - The HTTP URL Mapping Protocol," Internet-Draft, February 24, 2007 http://tools.ietf.org/html/draft-kunze-thump-02 (accessed January 28, 2008).

36 See J.A. Kunze, "A metadata kernel for electronic permanence," October 5, 2001 http://jodi.tamu.edu/Articles/v02/i02/Kunze/kunze-final.pdf (accessed January 22, 2008); and J. Kunze and, A. Turner. "Kernel metadata and Electronic Resource Citations (ERCs)," Draft, Dublin Core Metadata Initiative (DCMI) Kernel Metadata Task Group, August 11, 2007 http://dot.ucop.edu/home/jak/erc.html (accessed January 22, 2008).

37 J. Kunze and, T. Baker, "The Dublin Core metadata element set," RFC 5013, IETF, August 2007 http://tools.ietf.org/html/rfc5013 (accessed February 4, 2008).

38 J.A. Kunze and, M.A. Russell, "[noid (Nice Opaque Identifier) minting and binding tool]," April 19, 2006 www.cdlib.org/inside/diglib/ark/noid.pdf (accessed January 22, 2008).

39 NAAN / NAMH lookup tables are available at www.cdlib.org/inside/diglib/ark/natab (accessed January 22, 2008).

40 Herbert Van de Sompel and, Oren Beit-Arie, "Open linking in the scholarly information environment using the OpenURL framework," D-Lib Magazine (March 2001) www.dlib.org/dlib/march01/vandesompel/03vandesompel.html (accessed January 28, 2008).

41 Ex Libris SFX www.exlibrisgroup.com/sfx.htm (accessed January 28, 2008).

42 NISO, "ANSI/NISO Z39.88-2004, The OpenURL framework for context-sensitive services," April 15, 2007 www.niso.org/standards/resources/Z39_88_2004.pdf (accessed January 28, 2008).

43 A. Apps, "Why OpenURL?" D-Lib Magazine (May 2006) www.dlib.org/dlib/may06/apps/05apps.html (accessed January 28, 2008).

44 Information about the OCLC OpenURL Resolver Registry is available at www.oclc.org/productworks/urlresolver.htm (accessed January 29, 2008).

45 OCLC OpenURL 1.0 is available at www.oclc.org/research/software/openurl/default.htm.

46 See J. Hakala, "Using National Bibliography Numbers as Uniform Resource Names," RFC 3188, IETF, October 2001 http://tools.ietf.org/html/rfc3188 (accessed January 28, 2008).

47 Networked European Deposit Library, http://nedlib.kb.nl/index.html (accessed January 29, 2008).

48 K. Best, "A URN namespace for OASIS," RFC 3121, IETF, June 2001 http://tools.ietf.org/html/rfc3121 (accessed January 28, 2008).

49 This report is available as Cooperative Online Serials (CONSER), "Report - CONSER PURL pilot," www.loc.gov/acq/conser/purl/purlrept.pdf (accessed January 28, 2008).

50 OCLC News Release, July 22, 2007 www.oclc.org/news/releases/200669.htm (accessed January 29, 2008).

51 Library of Congress (LC), "Handle Server," May 4, 1998 http://lcweb2.loc.gov/ammem/award/docs/handle-server.html (accessed January 29, 2008).

52 For more information on CrossRef and the DOI System, see A. Brand, "ALPSP Advice Note 37: CrossRef," January 2007 www.crossref.org/01company/pr/ALPSP%20Advice%20Note%2037.pdf (accessed January 28, 2008); and CrossRef, "DOI name information and guidelines," June 11, 2007 www.crossref.org/02publishers/doi-guidelines.pdf (accessed January 28, 2008).

53 C. Brown, Manager, Journals Program, National Research Council (NRC) Research Press, telephone conversion, September 21, 2007.

54 J.A. Kuntz, "Towards electronic persistence using ARK identifiers," July 2003 www.cdlib.org/inside/diglib/ark/arkcdl.pdf (accessed January 28, 2008).

55 California Digital Library (CDL), Archival Resource Key (ARK) www.cdlib.org/inside/diglib/ark/ (accessed January 28, 2008).

56 E. Bermès, "Persistent identifiers for digital resources: The experience of the National Library of France," International Preservation News (IPN) 40 (December 2006): 22-34 www.ifla.org.sg/VI/4/news/ipnn40.pdf (accessed February 3, 2008).