Library Symbol Naming Authority Separator Character

IPIG List Discussion 7 - 9 April 1999


with Relevant 16 May Vote Comments

REVISED 31 May 1999; Comment of 3 May by Linda Driver, RLG, Added

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ed Davidson, FDI raised the issue on 7 April 1999

We at FDI have encountered a problem in parsing library symbols. We have discovered that it is not uncommon (In Australia at least) to have library symbols that contain colons (":") as part of the library symbol proper. This gives our software an interesting problem in knowing when the text before the colon is a naming authority and when its just part of a symbol without a naming authority. (We encounter these in Marc holdings records etc and our software can't always know in advance whether to expect an appended naming authority or not).

Before we go down the path of writing complex algorithms to solve this problem we thought that really the problem is with the naming authority separator itself. Wouldn't it be preferable to choose a separator that is unlikely to ever be used in a library symbol (like vertical-bar "|" or up-Arrow ("^")?

Slashes (back and forward facing), colon, semicolon, dot and dash are all likely to be part of legitimate existing library symbols.

As we are reviewing the IPIG profile at the moment, can we consider/discuss changing the naming authority separator symbol?

Sorry to have raised this so late in the day - but it's something we've discovered during customer testing.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Mark Wilson, TLC:

Ed, that is actually a very good idea and simplifies parsing. I certainly see a vertical bar '|' as an excellent replacement for the more commonly used colon.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Joe Zeeman, CGI for ILL ASMA

Another option would be to combine two characters. Such a combination would be less likely to occur in "real" symbols. We could use :: or ||. A third option would be to escape internal occurrences of our delimiter character: if the symbol is "QU:123" we could transfer "NLA:QU\:123".

NLA|QU:123

NLA::QU:123

NLA||QU:123

NLA:QU\:123

Of these I think I like the third representation the best. Is the | character universally available for input/display these days? It may not be common on non-US keyboards (although I have it on my notebook).

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Jim MacDonald, OCLC:

I would like to use something other than the vertical bar character. There are some systems that cannot accept this character.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

John Bodfish, Ameritech

We prefer the third option (escaping the character) for these reasons:

Whatever character we pick the "discovery" FDI made might revisit one of us. This has the same advantage as the "two character" proposal (less likely to occur normally). It has virtually no chance of making existing data invalid.

We do have to mind that some of us have live data to tend to.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Jim responded to John,

The escape character does solve the problems with the symbols, however it might be a problem for barcoding the initialRequestId portion of the transactionID. I would prefer to use something that can be barcoded easily. Extended code 39 barcoding would probably handle it, but I doubt any other barcodes would.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

John answered Jim:

I hadn't thought about bar-coding; I'm glad somebody's paying attention.

I do think IPIG needs to continue the discussion of transaction id length and scanning issues at the next meeting. Not only would I learn more about the subject but perhaps we can save ourselves (I mean, our users) a lot of trouble.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ruth Moulton comments on Joe's options and Jim's response to John:

Also support the 'escape character' solution - for the reasons given already.

I'm not sure I understand the bar coding issue (mentioned by Jim in his response to John)

Could you elaborate. Any character can be used as an escape by the way, it doesn't have to be '\' - this is just what many systems have used as a de facto convention (is this what the issue is?)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ed responded to Joe's suggestion:

I think I like the third option the least - it means lots of string mangling whenever we deal with a symbol - I'd prefer options one or two (Double Colon looks like a good option). I'd still best prefer an agreed single char that is not likely to be in a symbol - Up-Arrow if not Vertical-Bar would suit.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ruth Moulton comments on Ed's response to Joe:

The advantage of an escape mechanism is that you never have to worry if a separator is a character used in a symbol or not. If the separator exists in the symbol simply escape it (this only affects sending and receiving the strings in the APDUs doesn't it - once a string is parsed then the elements are stored in different fields and the need for the separator disappears?).

This is quite a common way to deal with separators and characters - and I don't believe it's difficult to write code to do this (lot easier than the BER decoding/encoding!!) - I'd suggest taking some code out of existing publicly available sources, but actually it's easier to write it than look for it...

I know that there have been several responses to Ed's message, however I'd like to return to the original problem that Ed described. I'm not sure this is really an issue. IPIG has authorized the use of "identifier of name authority" in system-id. Assuming that the "identifier of name authority" doesn't include a colon--and it shouldn't--the first colon encountered will be the separator character.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Linda Drive, RLG, added a comment on May 3:

I don't expect to have to parse for "identifier of name authority" outside of system-id. As long as we don't create any identifiers with colons (and I assume this is under our control), we shouldn't encounter this problem.

Thus, I don't see why we can't continue to use the colon as the separator character. I guess the real question is why is the "identifier of name authority" used outside of system-id? Isn't this outside the scope of the IPIG-defined usage?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Comments on this issue from the record of the 16 May VOTE:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ruth Moulton:

1. System-Id.Person-Or-Institution-Symbol

The syntax for this should allow ':' to be part of the name authority or symbol string, preferably by escaping it in the string. (This needs to be an item agenda at the meeting as well).

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

RLG:

7) Section 7.3.1, Formatting rules -- RLG supports use of the colon to separate the identifier of the name authority from the institution or person symbol.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~