|
|
Database Preparation ServicesUSMARC Format & Magnetic TapesUSMARC FormatThe standards for the representation and exchange of bibliographic, authority, and holdings data in machine-readable form in the United States are the three USMARC communications formats for Bibliographic Data, Authority Data, and Holdings Data. A USMARC record, hereafter referred to simply as MARC, is composed of three elements: record structure, content designation, and data content. The record structure is an implementation of the American National Standard for Bibliographic Information Interchange (ANSI Z39.2). Content designation (the codes and conventions established to identify data elements within a record and to support manipulation of that data) is defined by each of the MARC formats. The actual data content of a MARC record is defined by standards outside the bibliographic formats, including cataloging codes, classification schedules, and controlled subject heading lists. At the individual record level, the main components of a MARC record are a leader (24 characters), record directory (12 characters for each variable field), variable control fields, and variable data fields. Each variable field ends with a field terminator character and the last field ends with a field terminator character followed by a record terminator character. If your library's database contains records not conforming to the MARC standard, it may be possible to upgrade them to MARC without the time and expense of re-keying the data. The preferred method of upgrading non-MARC records to MARC is to extract control numbers (e.g., Library of Congress Card Number, ISBN) from the non-MARC file. These keys are then matched against a database of full MARC records. Most vendors should be able to move local call number and holdings information from the original record into the MARC record. Assuming a high overlap between the library's non-MARC records and the MARC record database, this MARC upgrade strategy puts a minimal burden on library staff and offers the library a database composed of full MARC records. Titles not found in the MARC file can be converted either in-house or through a retrospective conversion service. A second approach is to convert non-MARC records directly into MARC. The success of this customized service depends on the consistency and fullness of data elements in the non-MARC records. As with control number matching against a database of full MARC records, this method permits the entire database of non-MARC records to be upgraded rapidly with little or no library involvement. The disadvantage here is that the converted records may not be as useful because non-MARC records tend to reflect abbreviated and substandard cataloging. Magnetic Tape SpecificationsIn accord with the above mentioned ANSI standard for Bibliographic Information Interchange (Z39.2), MARC records are generally written to 1/2," 9-track, magnetic tape, with data coded in ASCII rather than EBCDIC. ANSI standard tape labels precede and follow the MARC record data file. While the storage capacity and data transfer speed of data written to 9-track magnetic tape pale in comparison with those available from more modern magnetic media (such as 4mm DAT tape or 8mm cartridge tape), bibliographic utilities, local system vendors, and library database preparation vendors cling to the old 9-track reel-to-reel tape for a variety of reasons. These include well established standards for how data is encoded on tape, existing programs for reading/writing data, and the universal acceptance of 9-track by the industry. Conventions for writing bibliographic data to 4mm DAT tape are far less standardized. Library vendors import and export catalog records written at 1600 or 6250 bytes (characters) per inch, abbreviated as BPI or CPI. The most common density at which data is written to tape is 1600 BPI and tape drives installed in the library frequently only read and write data at this density. Use of 800 BPI is considered obsolete and is rarely supported. Most bibliographic utilities and vendors transfer library files with data written in the OCLC-MARC format using unblocked, variable-length records. An interblock gap, or blank space, of about 1/2" separates each physical record. Except when a catalog record exceeds 2048 characters, there is a one-to-one correspondence between physical and logical (i.e., bibliographic) records. Other database suppliers, including the Library of Congress, write records in a blocked (2048 bytes per block) and spanned format. Blocking and spanning substantially increase the tape's storage capacity. As with tape density, a vendor should be able to handle both blocked and unblocked tapes. If the library is given a choice between receiving its records in OCLC-MARC or LC-MARC format, it should choose the former since some vendors may not be able to read blocked and spanned tapes. How Many Records Fit on a Tape?Tape is available in reels of 600, 1200, or 2400 feet. Extra length tape of 3600 feet is manufactured but seldom used because the tape's thinness reduces its strength. The number of records that can be stored on a magnetic tape depends on BPI density, length of the tape, and record size. Assuming an average MARC re-cord length of 800 bytes, about 20,000 records can be written to a 2400 ft. tape at 1600 BPI. The same tape written at 6250 BPI holds 60,000 records. Tape StorageIdeally, tape reels should be hung vertically in a temperature and humidity controlled environment, as generally found in a data processing facility. If possible they should be "refreshed" or rewritten every two to three years to new tape. In practice, data stored on tape is surprisingly robust. If kept away from magnetic fields, direct sunlight, and heat or humidity extremes, tapes can be stored safely for years and read with few or no data errors. In cases where data errors are present, your vendor may be able to save sufficient information to identify bad records for reprocessing. Next: Preliminary Steps |