Z39.50 Keyword Searching of Bibliographic Systems: A Discussion Paper




Prepared by Fay Turner,
National Library of Canada
and
Joe Zeeman, CGI
May 14, 1998
Revised June 30, 1998


Keyword searching is a feature supported by most Z39.50 clients and servers. This type of search is very useful when neither the exact author, title or subject of an item is known or when the objective of the search is to identify a broad range of documents to which the keyword(s) apply.

When Z39.50 is used to request a keyword search, a combination of Bib-1 attributes is sent by the origin to the target to represent a request for a keyword search. However, there is no consistency among Z39.50 clients or servers on the combination of attributes used to specify that a keyword search be performed. As a result, whether a keyword search is performed and the nature of the search varies depending not only on the client used but also on the server to which the query has been sent. The unreliability and inconsistency of the results are very confusing to the end user and produces user frustration and mistrust of the Z39.50 standard as a search tool.

For end users to feel positive about the standard and confident about the results of a Z39.50 keyword query, Z39.50 system developers need to agree on a common interpretation of keyword searching and the attribute combination to apply.


Traditional (Native) Keyword Searching

Keyword searching has been an important feature of most bibliographic systems. It is a powerful search option that allows the user to specify a search when he or she does not know the authoritative form for a subject heading or has incomplete or misremembered information about a title or author.

The characteristics of the keyword search facility for bibliographic systems include:

Many bibliographic systems have been enhanced to support all of these features while others support a large majority of them.


Z39.50 Keyword Searching

When Z39.50 is used as the standard for exchanging search requests between many and varied bibliographic systems, the developer of the Z39.50 client must choose from among the Bib-1 attribute set those types and values to indicate that a keyword search is expected from the server. Further, the server may have its own interpretation of what a keyword search is. In fact, there is little uniformity in what servers recognize as a keyword search.

The differences in server behaviour can be attributed to a number of factors including:

Another factor is how the Use attribute ANY is applied. If ANY is part of the keyword search, the access points to which the attribute is mapped vary depending on how the database is indexed and the server's interpretation of "commonly used access points" which is included in the semantic for ANY. For single index databases, such as SIRSI and BRS, an ANY search is very cheap. For databases with multiple indexes, a search on several of these could be expensive. To achieve consistent results with the application ANY in a search, the indexes that the ANY attribute is searched against should be specified but the architectural differences between search engines may be a barrier in meeting this objective.


Server Behaviour

Appendix A contains tables that specify for a sample of Z39.50 servers how each requires a keyword search to be specified in the query. An examination of the tables illustrates the variance in the interpretation of keyword searching by these servers.

For a server such as the AMICUS installation at the National Library of Canada, Position with value 3 (Any Position in Field) and Completeness with value 1 (Incomplete Subfield) are required to initiate a keyword search. For other servers, Position has to be omitted (Dialog); or it must be a value other than 1 (First in field) (MELVYL). For many servers the Completeness type is either not applicable or it defaults to 1 (Incomplete Subfield).

Other servers recognize a keyword search by only the Structure attribute Word or Word list (MELVYL) and make assumptions about Position and Completeness (LC).

Servers such as CARL always to do a keyword search against its WORD index unless specific Use attributes are specified.


Recommendation

The authors propose that the ZIG recommend the use of a standard semantic to specify keyword searching that consists of:

Position = any-position-in-field (3) or absent;
Relation = equal (3) or absent;
Completeness = incomplete-field (1) or absent;
Structure = word (2) (optionally servers may support phrase (1) to indicate a search for adjacent words); a term with structure "word" may contain only a single word;
Truncation = absent or as supported by the server for keyword searching; at least do-not-truncate (100) must be supported;
Use = as supported by the server for keyword searching; at least ANY (1016) must be supported; what elements are searched by ANY is determined by the server; however, a support for a minimum of name words, title words and subject heading words is recommended.


Appendix A: Specifying a Keyword Search to Library-based Z39.50 Servers

Below are tables indicating the Bib-1 attribute combinations used by a number of Z39.50 servers for keyword searching. The only criterion for inclusion in this document was that sufficient information about the server was available to put in tabular format. Information was supplied by contacts within the individual organizations and system administrators for these servers or was gathered from Web sites. Our apologies for any omissions or errors in interpretation of the information.

The values in grey cells do not need to be specified, as they have implicit or default values used by the server.


AMICUS

(National Library of Canada)
System NameSearch TypesUseRelationPositionStructureTruncationCompleteness
AMICUSKeyword title4-title3-equal3-any1-phrase
2-word
1-right
100-do not truncate
101-process # in term
1-incomplete subfield
 Keyword subject21-subject3-equal3-any1-phrase
2-word
1-right
100-do not truncate
101-process # in term
1-incomplete subfield
 Keyword note63-note3-equal3-any1-phrase
2-word
1-right
100-do not truncate
101-process # in term
1-incomplete subfield
 Keyword author1003-author3-equal3-any1-phrase
2-word
1-right
100-do not truncate
101-process # in term
1-incomplete subfield
 Keyword any1016-any3-equal3-any1-phrase
2-word
1-right
100-do not truncate
101-process # in term
1-incomplete subfield
 Keyword publisher1018-publisher3-equal3-any1-phrase
2-word
1-right
100-do not truncate
101-process # in term
1-incomplete subfield

Note 1: If values for Position, is not present in the query, AMICUS will default to Position 1 (first in field) and will therefore not do a keyword search, but will do an "exact" search (i.e. will search for the term at the beginning of a field).
Note 2: Relation 3 (equal) is the default value and may be omitted; other values are not currently supported for keyword searching (they are supported for some other search types). The default value for structure is 1 (word); the default for truncation is 100 (do not truncate); the default for completeness is 1 (incomplete subfield).
Note 3: A Use value must be supplied; there is no default.
Note 4: Boolean operations on keywords is supported; proximity operations on keywords is supported; keyword searches may be combined with other kinds of searches in a single query using Boolean operators


CARL

System NameSearch TypesUseRelationPositionStructureCompleteness
CARLKeyword21-subject
31-date of publication
1016-any
1017-anywhere
3-equal3-any1-phrase1-incomplete subfield

Note 1: Queries with the use attributes 1, 2, 3, 1002-1006, and 1009 are mapped to the NAME index, all others to the WORD index for a keyword search with the exception of: Use attribute 5-mappedto a Series browse; 9-mapped to LC Control number browse, 4, 6, 33-44 mapped to a Title browse, 16-19 and 53 mapped to a Call Number browse.
Note 2: All attributes other than Use are ignored.


DIALOG

System NameSearch TypesUseRelationPositionStructureTruncationCompleteness
DialogKeyword12-local number
1016-any
1-6Omit1-phrase
2-word
6-word list
All values acceptedOmit

Note 1: Query will fail if values are provided for Position or Completeness
Note 2: Structure will default to 2 (word) if no value provided


DRA

System NameSearch TypesUseRelationPositionStructureTruncationCompleteness
DRA Classic
Server
Keyword title4-title3-equal3-any6-word list
1-phrase
101-process #
in search term
1-right
1-incomplete
subfield
 Keyword title,
series,
subject,
abstract,
personal author,
corporate author,
conference author
1016-any3-equal3-any6-word list
1-phrase
101-process #
in search term
1-right
1-incomplete
subfield

System NameSearch TypesUseRelationPositionStructureTruncationCompleteness
DRA MultiLIS ServerKeyword title4-title3-equal3-any6-word list
1-phrase
101-process #
in search term
1-incomplete
subfield
 Keyword title,
series,
subject,
abstract,
personal author,
corporate author,
conference author
1016-any3-equalN/AN/A101-process #
in search term
1-incomplete
subfield

System NameSearch TypesUseRelationPositionStructureTruncationCompleteness
DRA Inlex/3000 ServerKeyword title4-title3-equalN/A1-phrase
2-word
6-word list
101-process #
in search term
1-right
100-none
N/A
 Keyword series5-title series3-equalN/A1-phrase
2-word
6-word list
101-process #
in search term
1-right
100-none
N/A
 Keyword subject21-subject
heading
3-equalN/A1-phrase
2-word
6-word list
101-process #
in search term
1-right
100-none
N/A
 Keyword notes63-notes3-equalN/A1-phrase
2-word
6-word list
101-process #
in search term
1-right
100-none
N/A
 Keyword author1003-author
1-personal name
3-equalN/A1-phrase
2-word
6-word list
101-process #
in search term
1-right
100-none
N/A
 Keyword any1061-any
1035-anywhere
3-equalN/A1-phrase
2-word
6-word list
101-process #
in search term
1-right
100-none
N/A
 Keyword server
choice (subject)
1017-server
choice
3-equal3-any1-phrase
2-word
6-word list
101-process #
in search term
1-right
100-none
N/A



INNOPAC

System NameSearch TypesUseRelationPositionStructureTruncationCompleteness
INNOPACKeyword1016-any3-equal3-any2-word 100-do not truncate1-incomplete subfield

Note 1: As supported by CISTI's INNOPAC system


Library of Congress

System NameSearch TypesUseRelationPositionStructureTruncationCompleteness
LC Z39.50 ServerKeyword personal name1-personal name3-equal3-any1-phrase
2-word
6-word list
100-do not truncate1-incomplete subfield
 Keyword corporate name2-corporate name3-equal3-any1-phrase
2-word
6-word list
100-do not truncate1-incomplete subfield
 Keyword title4-title3-equal3-any1-phrase
2-word
6-word list
100-do not truncate1-incomplete subfield
 Keyword title-series5-title-series3-equal3-any1-phrase
2-word
6-word list
100-do not truncate1-incomplete subfield
 Keyword subject21-subject3-equal3-any1-phrase
2-word
6-word list
100-do not truncate1-incomplete subfield
 Keyword notes63-notes3-equal3-any1-phrase
2-word
6-word list
100-do not truncate1-incomplete subfield
 Keyword any1016-any3-equal3-any1-phrase
2-word
6-word list
100-do not truncate1-incomplete subfield
 Keyword (added since March 1998)3,6,33,
35-44, 58
1002-1006,
1009, 1016,
1017, 1026,
1036
3-equal3-any1-phrase
2-word
6-word list
100-do not truncate1-incomplete subfield

Note 1: If word list is used, all words must be present in the same field in the record
Note 2: If the words appear in separate operands (ANDed together), they can appear in different fields in the same record
Note 3: If Phrase is used, the phrase can appear anywhere in the field
Note 4: The default value for structure is 2-Word


MEVYL

System NameSearch TypesUseRelationPositionStructureTruncationCompleteness
MELVYLKeyword title4-title3-equal3-any2-word
6-word list
1-right
100-do not truncate
101-process # in term
1-incomplete subfield
 Keyword subject21-subject
24-INSPEC
25-MESH
26-PA
29-local subject
3-equal3-any2-word
6-word list
1-right
100-do not truncate
101-process # in term
1-incomplete subfield
 Keyword any1016-any3-equal3-any2-word
6-word list
1-right
100-do not truncate
101-process # in term
1-incomplete subfield

Note 1: A query with Position 1 (first in field) results in an "exact", search (will search the exact key index).
Note 2: Subject 24, 25, 26 and 29 only apply to some databases


Pennsylvania State University

System NameSearch TypesUseRelationPositionStructureTruncationCompleteness
Penn State serverKeyword  3-any   
 Keyword2-corporate name  6-word list  

Note: A keyword search is done if either the Position is 3 (any position in subfield) OR if the Use attribute is 2 (corporate name) and the structure is 6 (word list).


SIRSI

System NameSearch TypesUseRelationPositionStructureTruncationCompleteness
Sirsi Unicorn serverKeyword1-9, 13, 16,
17, 20, 21,
25, 27, 29,
33, 35-41,
45, 50, 51,
53, 54, 56,
58-63,
1000,
1002-1006,
1008,
1009, 1016
N/A1-first in field
3-any
1-phrase
6-word list
1-right
100-do not truncate
1-incomplete subfield
3-complete field
  30-date
31-date of publication
1-less than
2-less than or equal
3-equal
4-greater or equal
5-greater than
6-not equal
 4-year  

Note 1: Although SIRSI provides sites with the default Z39.50 use attributes as described above, individual sites have the ability to add and/or remove support for specific use attributes and may also modify the default tag and subfield mapping associated with use attributes.
Note 2: When Sirsi's Infoview for Windows Z39.50 client is used to do a keyword search of the Unicorn server, if the terms supplied do not have a Boolean operator between them, the server assumes that the search should be for the terms within the same index (SAME operator) and not across all indexes. Other clients such as WebZ automatically insert an AND operator and the search is against all the indexes.