Database Preparation Services

Preliminary Steps

Putting It All Together

Important preliminary steps in preparing a database for a local system include bringing together the library's records into a single file, verifying that all records are present, and validating bibliographic data to ensure it is readable and that record structures conform to the MARC communications format.

It is common for a library's database to contain records from different sources. These sources might include records from a CD-ROM database, retrospective conversion records from a commercial vendor or network, as well as those generated as a by-product of cataloging on a bibliographic utility. In the last case, records may be acquired from the institution's individual tape subscription, an OCLC Local Database Creation tape, or extracted from a network's multi-institutional tape.

Regardless of where records originated, the library should be able to furnish information identifying record sources, the number of logical records, time periods covered for each tape or category, and, if processing chronology is important, the sequence in which records must be loaded.

Catalog Card Information vs. MARC Record

Records on archive tapes derived from a bibliographic utility do not necessarily contain the same information that appears on catalog cards. Data that is automatically added to or deleted from catalog cards by a bibliographic utility is not added or deleted from the corresponding tape record. Print constants (e.g., CONTENTS:), certain ISBD punctuation, brackets surrounding uniform titles, and the like are supplied by the utility's card print program. Other information on printed cards, such as automatic stamps above or below call numbers, oversize designations, and the presence or absence of non-LC subject headings result from the library's catalog card profile.

Are The Records Readable?

To test data integrity, LTI checks each record as it is read from tape. Checks include verifying record length from information contained in the leader, parsing the record directory, and testing that the record directory contains only numeric data. Records failing the verification process are weeded from the database and, if possible, printed out and returned to the library for re-processing. If the library's records are derived from a "suspect" source, a further check is made for characters that have had their "high bits" stripped and for the presence of characters not defined in the ALA or ANSEL character sets.

Record Extraction And Sorting

There are many situations in which database processing is required on only a subset of records. Consequently, the vendor must have the ability to extract records from a tape or disk file based on a variety of selection criteria. An obvious extraction need occurs when a library's records are combined with those of other libraries. In this case, the vendor must be able to isolate records based on holding library symbol or some other identification code. Printing custom edit lists or performing database clean-up operations require different extraction criteria.

LTI's software makes it possible to select records based on the occurrence or non-occurrence of fixed field data elements; tags, indicators, and subfield codes; a single character or character string in any variable field; or a combination of the above. For example, records consisting of French language serials published in Quebec after 1950 can be extracted for printing a special purpose bibliography.

Following extraction of the desired records, titles must be sorted into an arrangement useful for processing. Common sort orders include date of record creation, record ID or control number from the 001 field, call number (including LC, DDC, NLM, or SuDocs), main entry, title, and subject heading. Generally, when records are sorted, only an index key is extracted and sorted. Index keys are then mapped to a precise physical address (byte offsets) for each record in the file.

In this way the vendor can create the multiple indexes required for different processing operations without rearranging or rewriting the records. There may be times, however, when a library needs to receive its database with records written in a specified requested sort order. For example, the library may want to load its database into a local system in shelf list order because the local system software does not accommodate a secondary display sort. Your database vendor should have the capability of writing records to tape in whatever sort order is desired.

The date/time of record use is critical for deduping a library's database. LTI offers OCLC libraries a special "date/time of use transaction" sort based on information appearing in control fields. This allows records created after June 1981 to be arranged properly by transaction date if previous processing has affected the original sequence order. The date/time of record use is not present in OCLC records created prior to June 1981. In these records an artificial date/time stamp is used to determine transaction sequence relative to other records.

Next: Deduping