Library and Archives Canada
Symbol of the Government of Canada

Institutional links

Theses Canada

About Electronic Theses

Harvesting Program


How to Participate in LAC's ETD Harvesting Program

Presently, universities can choose whether or not to submit their electronic theses to LAC via ProQuest as well as participating in LAC's harvesting program. For information on how to submit electronic theses to ProQuest, see Submit Electronic Theses to ProQuest.

1. General Requirements

1.1 Signing the Theses Non-Exclusive License

Because LAC is harvesting theses as well as metadata, the Copyright Act (http://laws.justice.gc.ca/en/C-42/index.html) requires that graduate students sign the Theses Non-Exclusive License.

If you are submitting your electronic theses via ProQuest as well as through LAC's harvesting program, send the Theses Non-Exclusive License by mail along with the ProQuest Subject Code Form and any copyright permission letters, if required. When ProQuest has processed the theses, it will forward the licenses to Theses Canada.

If your university has chosen not to send its electronic theses to ProQuest, send the licenses to Theses Canada. Copyright permission letters should be kept at the university. The licenses should be sent to LAC close to the time your e-theses are available to be harvested from the data repository.

1.2. Restricted Content

LAC does not harvest electronic theses where either all or part of the theses are restricted. Ensure that restricted theses are put in a separate set on the server.

1.3. Record Deletion

LAC anticipates that metadata records for electronic theses will only be deleted on rare occasions and in exceptional circumstances. If it becomes necessary to delete a record from the data repository, contact Theses Canada so that all references to it can be removed from LAC's databases.

2. Technical Requirements

2.1 OAI Guidelines

For LAC to harvest your university's ETD metadata, you must set up an Open Archives Initiative (OAI) data repository. LAC requires that you use version 2.0 of the OAI Protocol for Metadata Harvesting for OAI-identifier namespace in the "Identify" response.

Guidelines on how to implement an OAI data repository are available at the Open Archives Initiative website:

2.2 Sets

It is important that you create one set that includes all and only unrestricted electronic theses. The ETD set can then be split into hierarchical subsets to organize your ETD collection. Please ensure that there are no PDFs from ProQuest that are already in LAC's collection in the set.

2.3 Metadata

LAC requires metadata in two formats: Dublin Core and ETD-ms. ETD-ms is the metadata standard that has been endorsed by the NDLTD. It is based on Dublin Core with several additional fields specific to theses. The standard is available on the NDLTD website (www.ndltd.org/standards/metadata/etd-ms-v1.00-rev2.html).

ETD-ms metadata is converted to MARC 21 and uploaded to the Theses Canada Portal and to AMICUS. On conversion to MARC 21, the loss of some special characters, formulae and math coding is unavoidable. As a result, the titles and abstracts in AMICUS may vary from the original. Once the metadata is available on the Theses Canada Portal and in AMICUS Web, the electronic theses can be accessed by the URL in the metadata.

2.4 (Theses Canada) Number

Each record in the oai_etdms metadata format needs an element containing a TC number which includes a library symbol and a unique number. The TC number can be added immediately after the other identifiers. The TC number is not required in the Dublin Core metadata.

The TC number is created using the following format:

<identifier>TC-[Library Symbol] - [Unique Number]</identifier>

2.4.1 Library Symbol. Each library in Canada is assigned a symbol for use in bibliographic records, interlibrary loan, etc., (e.g. Library and Archives Canada is OONL). If you do not know your university's library symbol, contact Theses Canada.

2.4.2 Unique Number. Any internal unique number that does not exceed 15 characters.

2.5 OAI Identifier Element

LAC requires a specific description element which contains the <oai-identifier> element. These elements must be present in the metadata returned during an "Identify" response from a university. The <oai-identifier> element and its metadata are needed to properly harvest and identify the university's electronic theses records. An example is shown below:

<description>
    <oai-identifier
    xmlns="http://www.openarchives.org/OAI/2.0/oai-identifier"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai-identifier
    http://www.openarchives.org/OAI/2.0/oai-identifier.xsd
    http://www.openarchives.org/OAI/2.0/oai-identifier.xsd">
       <scheme> oai </scheme>
       <repositoryIdentifier> REPOSITORY URL </repositoryIdentifier>
       <delimiter> : </delimiter>
     <sampleIdentifier>oai:REPOSITORY URL:1993/94</sampleIdentifier>
  </oai-identifier>
</description> 

2.6 Repository URL

The repository URL should be replaced with the valid URL. LAC's Repository URL in its "Identify" responses is http://oai.collectionscanada.gc.ca/oai/oai.php.

2.7 URL Identifier Element

For the URL identifier element, i.e. <identifier>URL</identifier>, LAC requires a direct link to PDFs that ends with a ".pdf" extension.

2.8 Date Granularity

You must use the UTC date granularity of: YYYY-MM-DDThh:mm:ssZ for the following OAI elements:

<responseDate>
<earliestDatestamp>
<datestamp>

2.9 OAI ETD-ms Validation

To facilitate the harvesting process it is important that you validate your oai_ETDms records against this xsd before arranging to be harvested: www.ndltd.org/standards/metadata/etdms/1.0/etdms.xsd

2.10 Institutional Repository Software Upgrades

The LAC harvesting program application is based on the stability of:

  • Record identifiers
  • Date ranges

If a university plans to change or update that information it is important to contact Thesis Canada ahead of time to ensure that the harvesting process does not fail. If something needs to be changed, Library and Archives Canada will coordinate efforts with the university to assure the continuity of services.

2.11 PDF Format

At present, LAC is only harvesting electronic theses in PDF-text format as LAC's harvester is only able to harvest single file PDFs. PDF theses must be compatible with Adobe Acrobat version 5.0 or higher. LAC is able to harvest documents in PDF (A) format.

2.12 Validation of Metadata and Test Harvest

Once you have completed the technical requirements, validate your XML metadata using the E-Theses Validator. After fixing any resulting problems contact Theses Canada to arrange for a test harvest of your university's metadata and ETDs.

2.13 Requirement Checklist

To insure that your repository is ready to be harvested, review each one of these requirements.

General Requirements

Completed

Theses Non-Exclusive Licenses signed by authors have been submitted (See par 1.1)

 

All electronic theses for harvesting have been included in a single set (see par. 2.2)

 

A single PDF for each thesis was created and is compatible with Adobe Acrobat version 5.0 or higher (see par. 2.11)

 

Metadata Requirements

Completed

OAI-DC (Dublin Core) metadata is provided (see par. 2.3)

 

OAI-ETD-ms metadata is provided (see par. 2.3)

 

A valid Number is provided and contains your library symbol and a unique internal number not exceeding 15 characters (see par. 2.4)

 

The oai-identifier element in the ETD-ms metadata is provided (see par. 2.5)

 

A valid URL is provided giving direct access to the PDF file (see par. 2.7)

 

Date granularity is provided (see par. 2.8)

 

oai-etdms records have been validated against the xsd (see par. 2.9)