XValidate - Cross-validate a set of files

XValidate attempts to perform a cross-validation on a set of files to look for inconsistencies. It takes up to two parameters:

XValidate [-n] [[@]filename]

where:

n is the validation level: this is currently '0' (or ignored) for full validation; '1' for middling validation; '2' for minimal validation (no longer used)
filename is the file to be cross-validated; most commonly this is used with the @ prefix to indicate a control file containing a list of file names

Note that, if using a control file, the following control flags are supported:

-X:filename - this specifies additional exception files (instead of looking for xvalidate.xxx in the current folder). This allows cross-validation to be done for each index in turn and then for all the indexes together without (much) duplication of exception listings.
-V:n - this allows the validation level to be over-ridden for individual sections of the control file.

Note also that, if a file contains DQE~VALFULL (even with qualifiers) then the validation level for that file is set to 0.

As discussed under SCAN_FILE the (current) validation level is stored in each record. Then, when we are doing the comparisons (see below) the programs sets up two control variables depending on the contents of the two records being compared - val_full and val_most. By default both are set to "True" implying all validation should be done. However,

If the author is "Anon." or starts with [, and the two records are in different files and for different publication details then val_full is reset to "False";
If the validation level for both records is not set to 0 & neither item type is a "major" type (vi,ss,nv,na,sl,pm,ts or n.) then both val_full and val_most are reset to "False".

This allows an incremental approach to cross-validation rather than trying to do it all at once.

Initially the program just calls SCAN_FILE for each file being cross-validated, building up data in the scandata structure. It then calls qsort to sort the magbuf and itmbuf arrays using the SCANITEM_COMP_RTN comparison routine for the latter. It then simply compares each scanitem record against the next one in the array. Generally speaking the author and compacted item title have to match before we do any comparison. However, one exception is if we have a reprint that comes from one of "our" files but doesn't appear to be specified therein. Specifically:

If the author, compacted title, publication details or item types differ between the two (and neither of the latter is "ex"); and
The new item is a reprint (i.e. edition is '2' which will always sort after the original appearance); and
We have at least a year and month for the data of original publication; and
The original publication was in one of "our" magazines (i.e. the magazine ID is in magbuf); and
It isn't an anonymous cover; and
The validation level for the new item is "0" or the item type is a "major" type (as discussed above).

In this case we output a diagnostic saying the original appearance is missing.

Otherwise, assuming the author and compacted title match, we do the following checks. Note that, to minimise diagnostics, these are in a long If/Then/Else so that if one mismatch is found then the subsequent checks are omitted (even if we choose not to report the mismatch). The checks, in order, are then as follows:

If only one record is a dummy series entry (i.e. ends in '|') we do nothing;
Else we check if the publication details are the same but the real authors are different, but only report an error if val_full is set;
Else if the real authors are different we do nothing;
Else we check if the full titles are the same, but only report an error if val_full is set;
Else we check to see if the serial maximum differs for two entries in the same file (saving the title if so to avoid multiple identical errors);
Else, if we have a serial maximum, we check to see if the item types, title additional and series name are the same (while this partially duplicates checks later, we'll be bypassing them for serial parts so we need to do them here);
Else if the serial parts differ and/or are unknown then we do nothing;
Else we check to see if the item types are the same (ignoring differences if one or the other is "ex", "iw" or "br" or if the title is generic), but only report an error if val_full is set;
Else we check to see if the publication details differ, except that:
- we ignore dummy series entries;
- we ignore generic titles;
- we ignore minor item types (iw,as,av,bg,bi,br,cl,cn,cs,ct,ed,fp,fr,gp,gr,hd,hu,ia,il,in,is,iv,ix,lc,lr,mr,ms,ob,pi,pr,pt,pz,qa,qz,rc,rv,th or ??);
- we only report an error if val_most is set;
Else we check if the title additionals are the same, but only report an error if val_full is set;
Else, if we're doing dummy series entries, we do nothing else;
Else we check to see if the series names match;
Else we check to see if the original titles match;
Else we check to see if the bylines match for items with the same publication details;
Else we check to see if the co-authors match for items with the same publication details;
Else we check to see if the secondary names match;
Else we check to see if the appearance notes match;
Else we check to see if the ED notes match (allowing for differences caused by an appended magazine ID), but only report an error if val_full is set.