IdxGen Routines

Main Routines

IDXGEN consists of the following main routines:

The main program
PARSE_CONFIG - Read and Parse the Index Configuration File
SETUP_FILES - Set up the consolidated Magazine and Book Data Files
SETUP_BOKAUT_IDX - Set up all the anchors for the Book Author Index
SETUP_BOKTTL_IDX - Set up all the anchors for the Book Title Index
SETUP_ISSUE_IDX - Read all the files and set up the Issue Index
WRITE_LINKS_FILE - Create the files for links from the GCP Website
SETUP_STYTTL_IDX - Set up all the anchors for the Story Title Index
SETUP_STYAUT_IDX - Set up all the anchors for the Story Author Index
SETUP_ARTIST_IDX - Set up all the anchors for the Artist Index
SETUP_CHRON_IDX - Set up all the anchors for the Chronological Index
SETUP_SERIES_IDX - Set up all the anchors for the Series Index
SETUP_BIOG_NOTES - Set up all the anchors for the Biographical Notes
WRITE_TABLE_OF_CONTENTS - Write the Table of Contents File
BUILD_BOKAUT_IDX - Build the Book Author Index
BUILD_BOKTTL_IDX - Build the Book Title Index
BUILD_ISSUE_IDX - Build the Magazine Issue Index
BUILD_FULLTEXT_IDX - Build the Full-Text Index
BUILD_STYAUT_IDX - Build the Story Author Index
BUILD_STYTTL_IDX - Build the Story Title Index
BUILD_ARTIST_IDX - Build the Artist Index
BUILD_CHRON_IDX - Build the Chronological Index
BUILD_SERIES_IDX - Build the Series Index
BUILD_BIOG_NOTES - Build the Biographical Notes
WRITE_STATISTICS - Write the Statistics Page

which communicate via the Configuration Data structure and the Other Global Data. There are also a number of significant support routines:

CREATE_NAME_LINK - Create a new nameslink entry
CREATE_NEW_PAGE - Create a new page for the specified file
FLUSH_DREC - Read all following 'D' records and first 'E' record (if any).
FORMAT_AUTHOR - Format an Author Name as heading (with scanitem as input)
FORMAT_AUTHOR2 - Format an Author Name as heading (with AUTH structure as input)
FORMAT_AUTHTYPE - Format an Author Type as subheading
FORMAT_BOOK_DETAILS - Format book details into summary form
FORMAT_FULLTEXT_LINK - Format a Full Text link field
FORMAT_ISSUE_LINK - Formats a set of publication details and adds a link to the associated issue/book if in the current index.
FORMAT_MAGPUB - Format magazine issue details
FORMAT_NAMES - Format a names field.
FORMAT_NOTES - Format a notes field
FORMAT_PUBDATE - Format Pub. Info. date field
FORMAT_SERIES - Format series field
FORMAT_URL - Format a URL specified in an FM data file
GET_ANCHOR - Get the HTML to link to a specific anchor
GET_ANCHOR2 - Get the HTML to link to a name index anchor
GET_BOKTTL_INDEX - Find index in book title links table for a particular record number.
GET_NAME_INDEX - Get index of specified name (if possible)
GET_NAME_LINK - Get link to specified name (if possible)
GET_PUBDET_INDEX - Get offset of link in Issues Table for set of publication details.
GET_SERIES_LINK - Get link to specified series (if possible)
READ_BOOK_FILE - Read a specified book data file and output the contents to the consolidate book data file with sort header.
READ_MAG_FILES - Read all the magazine data files (in order) and output the contents to the consolidate magazine data file with cover scan, full-text and about links added.
REPORT_DIAGS - Report Extended Diagnostics if required
SPLIT_TITLE - Split title field into constituent parts
TIDY_DNOTES - Tidy up contents of any 'D' Notes
WRITE_INDEX_LINE - Write a line to the intermediate index(es)
WRITE_PAGE_HEADER - Write a standard header to a file.
WRITE_PAGE_TRAILER - Write a standard trailer to a file.
WRITE_PUB_INFO - Format and output all "Pub Info" records
WRITE_STORY_ITEM - Write a single story item group.

Note that several of the BUILD_xxx routines revolve around the use of Boilerplate Files and the whole program relies on a sequence of Sorted Files. Note also that the program relies on a naming convention for the data files as discussed here and an external sort command file.

Currently IDXGEN only supports UK 7-bit format input files - it wouldn't be rocket science to extend this to handle US format files as well but it isn't a priority at present.

The main program

The main program:

Gets the name of the Index Configuration File
Allocates the scandata structure
Opens the Index Configuration File (and diagnostics files) and calls PARSE_CONFIG to read and parse it
Opens the Temporary File (IdxGen.tmp in the indicated folder) and, if required, the diagnostic dump file (IdxGen.dmp in the same folder)
Calls SETUP_FILES to consolidate any specified magazine and/or book data file into a individual files (IdxGen.mgs & IdxGen.bks), doing any necessary sorting and adding cover scan/full text links for magazine issues.
If we had any books it then:

Calls SETUP_BOKAUT_IDX to read the book data file (IdxGen.bks) and store all the data in IdxGen.tmp, as well as to set up all the anchors for the Book Author Index
Calls SETUP_BOKTTL_IDX to set up all the anchors for the Book Title Index

Calls SETUP_ISSUE_IDX to read the magazine data file (IdxGen.mgs) and store all the data in IdxGen.tmp, as well as to set up all the anchors for the Magazine Issue Index
Calls WRITE_LINKS_FILE to clean up the Issue Link Table and to write the Links File(s)
Calls SETUP_STYTTL_IDX to read the data from IdxGen.tmp, set up all the anchors for the Story Title Index and output the data including the story title anchors in IdxGen.tm4
Calls SETUP_STYAUT_IDX to sort IdxGen.tm4 into IdxGen.Aut and read it to set up all the anchors for the Story Author Index; it also outputs the data, including the story author anchors, in IdxGen.ttl; and the data, including artist information, in IdxGen.tm6
Calls SETUP_ARTIST_IDX to sort IdxGen.tm6 into IdxGen.Art, and to set up all the anchors for the Artist Index
Calls SETUP_CHRON_IDX to read IdxGen.ttl and extract the records needed for the Chronological Index into IdxGen.Crn, and to set up all the anchors for the Chronological Index
Calls SETUP_SERIES_IDX to read IdxGen.ttl and extract the records needed for the Series Index into IdxGen.Ser, and to set up all the anchors for the Series Index
Calls SETUP_BIOG_NOTES to set up the Biographical Notes Index
Calls WRITE_TABLE_OF_CONTENTS to write the Table of Contents File
If we had any books it then:
- Calls BUILD_BOKAUT_IDX to build the Book Author Index
- Calls BUILD_BOKTTL_IDX to build the Book Title Index
Calls BUILD_ISSUE_IDX to build the Magazine Issue and store all the full text links in IdxGen.ftx
Calls BUILD_FULLTEXT_IDX to read IdxGen.ftx and build the Full Text Index
Calls BUILD_STYAUT_IDX to build the Story Author Index
Calls BUILD_STYTTL_IDX to build the Story Title Index
Calls BUILD_ARTIST_IDX to build the Artist Index
Calls BUILD_CHRON_IDX to build the Chronological Index
Calls BUILD_SERIES_IDX to build the Series Index
Calls BUILD_BIOG_NOTES to build the Biographical Notes Index
Calls WRITE_STATISTICS to build the Statistics Page
Closes any open files, outputs usage summaries and elapsed time, and exits

PARSE_CONFIG - Read and Parse the Index Configuration File

This module simply reads the Index Configuration File, checking that all essential fields have been specified and defaulting any unspecified, optional, fields.

SETUP_FILES - Set up the list of Files

This module reads the list of files specified for the index and handles these differently depending on which type of file they are:

Each magazine data file is added to the magfiles section of the Configuration Data structure, where the magazine name pointer is set up by calling GET_ABB_NAME - note that this is a pointer to an internal table so there is no need to allocate memory for it.
Each additional reference file is also added to the magfiles section of the Configuration Data structure but the magazine name pointer points to a fixed string "~~~~", both to distinguish it and to sort it to the end if the magazine names are sorted.
For each book data file, the routine first checks to see if we already have some books (got_books set) and, if not, opens the consolidated books data file (IdxGen.tm1 in the temporary folder) and calls READ_BOOK_FILE to read the data from the books file and add it to the consolidated books data file with the appropriate sort header.

When all the files have been read it checks to see if this index wants the magazine names to be sorted and, if so, calls qsort to sort the magfiles array and READ_MAG_FILES to read all the magazine files and create the consildated magazine data file (with added cover scan, full text and about link information).

It then checks to see if we had some books and, if so, calls the sort routine to sort it into order (IdxGen.tm2 in the temporary folder). It then opens the sorted file and copies the contents to the real consolidated book data file (IdxGen.bks in the temporary folder) stripping off the sort header.

SETUP_BOKAUT_IDX - Set up all the anchors for the Book Author Index

This module is only called if some books have been detected and, if so, reads all the records in the consolidated book data file (IdxGen.bks) and:

Passes each records through to the SCAN_RECORD routine to add to/create the raw SCANITEM file.
Sets up the structure of the Book Author Index, storing links to the book IDs in the Issue Link Table
Stores links to the author entries in the Book Author Indexes in the internal bokautlink author structures and sorts the table

Note that:

It is essential that the code that counts lines and calculates when to throw a new page matches that in BUILD_BOKAUT_IDX so that anchors are created where and when they should be (the latter routine does sanity checks to ensure this is the case). This routine has to do this for both the listings and contents levels as follows:
- A new page is thrown at the contents level if we have passed the minimum page size and we have an 'A' record. Note that we should never need a continuation page at this level as we want each set of book contents on a single page, but we log a diagnostic (once per page) if we pass the maximum page size.
- A new page is thrown at the listing level if we have passed the minimum page size and the author name has changed, or we have passed the maximum page size. Note that the latter will require a continuation page.
We need an anchor for all books so we can link the Story Author and Story Title entries back to them so, if the book doesn't have a Book ID, we generate a fake one of the form Fnnnnn#mm where nnnnn is the current page number (padded to 5 digits) and mm is the current anchor number. In these cases we also need to create a fake "name" for the book we are linking back to that is similar to those translated via ABBREV.CVT (via FORMAT_BOOK_DETAILS)

SETUP_BOKTTL_IDX - Set up all the anchors for the Book Title Index

This routine ...

SETUP_ISSUE_IDX - Read all the files and set up the Issue Index

This module reads all the all the records in the consolidated book data file (IdxGen.mgs) and:

Passes each records through to the SCAN_RECORD routine to add to the raw SCANITEM file (IdxGen.tmp).
Sets up the structure of the Magazine Issue Index, storing links to the magazine issues and feature records in the Issue Link Table
Stores links to the author entries in the Book Author Indexes in the internal author structures
Sorts the Issue Link Table before exiting
Adds special entries to the SCANITEM file for any 23/25/26 records in PSEUD.CVT so that we can add entries for them when needed.

Note that:

It is essential that the code that counts lines and calculates when to throw a new page matches that in BUILD_ISSUE_IDX so that anchors are created where and when they should be (the latter routine does sanity checks to ensure this is the case). This routine has to do this for both the listings and contents levels as follows:
- A new page is thrown at the contents level if we have passed the minimum page size and we have an 'A' record. Note that we should never need a continuation page at this level as we want each set of issue contents on a single page, but we log a diagnostic (once per page) if we pass the maximum page size.
- A new page is thrown at the listing level if we have passed the minimum page size and we have a new main features record, or we have passed the maximum page size. Note that the latter will require a continuation page.
We set up special anchors at the contents level for feature records so that we can handle cross-references. These start with a '%' followed by the magazine name so that they can't be confused with other entries.
Anchors are created at the contents levels for all 'A' records including features records (the latter to allow links from the GCP website). As there can be multiple features records with the same magazine ID this routine adds ^^nmmm to the abbreviation where 'n' is 1 for a main feature record and 2 for a sub-header and mmm is the current record number to handle multiple sub-headers with the same abbreviation. After the array is sorted this means that the first entry for an abbreviation is either the first sub-header record for that abbreviation or the main features record if there were no sub-headers. The first suffix is then stripped off by the WRITE_LINKS_FILE routine.
Although we need to create anchors for magazine features records at the listing level these are only ever used to link one level of an index to the next so there is no need to store them in the internal array (which avoids the problem that they might be duplicated).

WRITE_LINKS_FILE - Create the files for links from the GCP Website

This routine creates the Magazine Links File from the feature record entries in the Issue Link Table added by SETUP_ISSUE_IDX. It simply creates the top level file and then loops through the Issue Link Table looking for entries that start with four spaces and have an embedded ^^. As discussed above, there may be multiple entries for a given abbreviation, distinguished by a suffix after ^^ so the routine just strips off the suffix from the first such and writes an entry to the appropriate Links File for it.

SETUP_STYTTL_IDX - Set up the Title Index

This module sorts the raw SCANITEM file (IdxGen.tmp) into title-order (as IdxGen.tm3). It then reads through the file setting up the structure of the Story Title Index, rewriting the file (as IdxGen.tm4) with the relevant links to the Story Title Index embedded.

SETUP_STYAUT_IDX - Set up the Story Author Index

This module sorts the title-order SCANITEM file into author-order. It then reads through the file setting up the structures of the Story Author and Artist Indexes, storing links for each author/artist in the internal author structures.

SETUP_ARTIST_IDX - Set up all the anchors for the Artist Index

This module ...

SETUP_CHRON_IDX - Set up the Chronological Index

This module sorts the title-order SCANITEM file into chronological-order. It then reads through the file setting up the structures of the Chronological Index, storing links for each author in the internal author structures.

SETUP_SERIES_IDX - Set up the Series Index

This module first needs to read through the file generated by SETUP_STYAUT_IDX (IdxGen.Ttl), which was in Story Title Index order and rewrite the entries that belong to a series (as IdxGen.tm8) in the order needed by the series index. This is complicated by a number of factors.

Firstly, if an item in a series also has a series prefix it will appear in the input file twice (once with the prefix and once without) but we only want a single entry in the series index so we ignore any titles that contain a "|" divider. A similar thing happens with articles about a series that quote the author as a subject which will have entries both for the author of the article and the subject so we want explicitly to ignore the latter.

Secondly, if the item in question is part of a serial, it will appear in the input file for each part of the serial, and the convention is that we only list the first (known) part with ", etc." appended if there are multiple parts (currently this also applies to multiple stories with the same name, as per v1, but we might want to consider changing this). We use the val_level field in the scanitem structure to indicate this - it is set to '0' if it is not part of a serial and to '1' otherwise. However, we won't know until we read the "next" record whether the flag needs to be set for the "current" record so the routine has to play around with two records at the same time.

Thirdly, if the item in question is by multiple authors, it will appear in the input file for both authors. In simple cases we just pass both through to the sorted file and then only list the first instance of each item. A complication occurs if the item is by a house name or joint pseudonym where the author(s) is known. In this instance the item will appear in the input file for the house name/joint pseudonym as well as the real author(s). For reasons that aren't currently clear the former does not contain the information needed to create a proper entry (listing house name/pseudonym as well as real author(s)) which is a problem if it sorts before the other entries. To address this the code checks to see if the current item is for a pseudonym (nonempty secnam_ptr) and the next is for a real name (empty secnam_ptr) then the current one is ignored in favour of the next. (This problem was noted for Petra Christian items authored by Christopher Priest on his own).

Fourthly, we want the items sorted in order of original publication date. If the original publication is in the file then it will appear first, but in some cases all we have in the file is a reprint so we need to sort out the original publication date, which might involve decoding a book abbreviation to see if there is a more precise date in ABBREV.CVT. In all cases it attempts to create a numeric version of the date so that we can sort by it.

Any item may belong to multiple series simultaneously so the routine next needs to split the series field into individual series. Then, for each series:

It remembers any other series in the ednote_ptr field so we can add them later
It works out the author name(s) to be used when sorting by author (the default): this has a few complications:
- we want the name to be normalised but we can't use nrmaut_ptr as that is the normalised form of the byline, whereas series are grouped under the real name(s) so we need to re-nomalise the real names into the nrmaut_ptr field
- we have to handle "Anon." by hand to sort it to the end of the list
- we use CHK_SERIES to see if any of the authors in the current item are listed in SERIES.IDX as priority authors, prefixing the priority code to the normalised author string

processes the internal series records array, setting up the structure of the Series Index, and storing a pointer to each series entry in the internal series structures.

SETUP_BIOG_NOTES - Set up all the anchors for the Biographical Notes

This module ...

WRITE_TABLE_OF_CONTENTS - Write the Table of Contents File

BUILD_BOKAUT_IDX - Build the Book Author Index

This module ...

BUILD_BOKTTL_IDX - Build the Book Title Index

This module ...

BUILD_ISSUE_IDX - Build the Magazine Issue and Book Author Indexes

This module reads all the files defined in the (sorted) magazine file array followed by the consolidated book data file and creates the Magazine Issue and Book Author Indexes. This gets potentially complex as we are building four levels of index simultaneously:

The top-level index (a1.htm for Books and a3.htm for Magazines)
The middle-level index (pnnnnn.htm for both)
The bottom-level index (bnnnnn.htm for both)
The issue/book contents listings (tnnnnn.htm for both)

There are controlled by parallel sets of variables as follows:

xxxfil_ptr - the file pointer for the associated file at the specified level (where xxx = top, mid, bot or iss as appropriate)
xxxpage_count - the current page number at the specified level (mid, bot & iss only as the top-level index is always a single page)
xxxline_count - the current line number at the specified level (mid, bot & iss only as the top-level index is always a single page)
xxxanchor_cnt - the current anchor number for the current page at the specified level (bot & iss only as anchors aren't used in the other two)

For simplicity (?) the code is potentially executed twice - once for magazines and then again for books - on the assumption that the two passes have more in common than they have in differences (though it is debatable). For each pass, the code first creates (and writes a header to) the top-level file but, for simplicity, all the other files are created as needed based on the associated line_count and the values specified in the Configuration Data structure for minimum and maximum page sizes. Put simply:

For the issue/contents listings we count a line for every record read from the input data files and throw a new page when we encounter an 'A' record if we have exceeded the minimum page size or unilaterally if we have exceed the maximum page size.
For magazines, a single line is counted in the bottom-level index for each 'A' record and we throw a new page when we have a group header and have passed the minimum page size or unilaterally if we have exceeded the maximum page size.
For books, we have the unusual situation that books without contents will only be output to the bottom-level index while for books with contents only the A & D records will be output to the bottom-level index, so we count a line for each 'A' or 'D' record and throw a new page when we have a new author and have passed the minimum page size or unilaterally if we have exceeded the maximum page size.
For magazines, a single line is counted in the middle-level index for each 'A' record which is a features record and, again, we throw a new page when we have a group header and have passed the minimum page size or unilaterally if we have exceeded the maximum page size.
For books, a single line is counted in the middle-level index for each new author, plus one for any unilateral page throws in the bottom-level index (in which case we need to add a continuation record to the middle-level index) and we throw a new page when we have a new author and have passed the minimum page size or unilaterally if we have exceeded the maximum page size.
In each case, the top-level index simply has a line representing the first and last entries in each page of the middle-level index.

Note that, in the above, books edited by an author are treated as being the same as those written by the author.

One particular issue that arises with both sets of files is the handling of 'D' records for two reasons:

In the UK format the 'D' records might contain information that is listed with the 'A' records (e.g. cover artist; series name)
As discussed above, for book data files, whether or not the 'A' and 'D' records are output to the book contents listings depends on whether or not there are any 'E' records.

To handle this, whenever we encounter an 'A' record we call FLUSH_DREC to read all following 'D' records and the first (real) 'E' record if there is one.

BUILD_FULLTEXT_IDX - Build the Full-Text Index

This module ...

BUILD_STYAUT_IDX - Build the Story Author Index

This module reads the author-order SCANITEM data file and creates the Story Author Index.

BUILD_STYTTL_IDX - Build the Story Title Index

This module reads the title-order SCANITEM data file (IdxGen.ttl) and creates the Story Title Index. This is somewhat different to most of the build routines as the bottom level of the index is actually the Level 1 Index rather than the Listings Level.

BUILD_ARTIST_IDX - Build the Artist Index

This module ...

BUILD_CHRON_IDX - Build the Chronological Index

This module reads the chronological-order SCANITEM data file and creates the Chronological Index.

BUILD_SERIES_IDX - Build the Series Index

This module reads the internal series record array and creates the Series Index.

BUILD_BIOG_NOTES - Build the Biographical Notes

This module ...

WRITE_STATISTICS - Write the Statistics Page

This module ...

CREATE_NAME_LINK - Create a new nameslink entry

/************************************************************************/
/*									*/
/*  CREATE_NAME_LINK - Create a new nameslink entry			*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	linksub = CREATE_NAME_LINK (scanitem_ptr, prtfil_ptr);		*/
/*									*/
/*  Where:								*/
/*									*/
/*	linksub	     = Subscript into nameslink table of (main) item	*/
/*		     = -1 if an error occurred				*/
/*	scanitem_ptr = Pointer to SCANITEM structure			*/
/*    	prtfil_ptr   = Pointer to diagnostics file (may be NULL)	*/
/*									*/
/*  This routine allocates an entry in the Names Link table for the	*/
/*  name specified in the supplied scanitem.				*/
/*									*/
/************************************************************************/

This routine allocates a new entry in the Names Link Table, setting up the name from the normalised name in the scanitem passed as a parameter. It calls FIND_NAME to set up the PSEUD.CVT entry (if any) and initialises all the other fields.

CREATE_NEW_PAGE - Create a new page for the specified file

/************************************************************************/
/*									*/
/*  CREATE_NEW_PAGE - Create a new page for the specified file		*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	status = CREATE_NEW_PAGE (config_ptr, outfil_ptr, page_count,	*/
/*				  prefix_ptr, pagttl_ptr, prtfil_ptr);	*/
/*									*/
/*  Where:								*/
/*									*/
/*	status	   = Result of operation:				*/
/*		   = PSP_TRUE if OK; else PSP_FALSE			*/
/*	config_ptr = Pointer to CONFIG_DATA structure			*/
/*	outfil_ptr = Pointer to file pointer for creating new page	*/
/*	page_count = Current page count					*/
/*	prefix_ptr = File/Folder prefix letter				*/
/*	pagttl_ptr = Title of page					*/
/*	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/************************************************************************/

All the HTML pages in the index follow the same basic format so this routine is called whenever we want to create a new page in an index level. If we have a current page (i.e. page_count > 1) it calls WRITE_PAGE_TRAILER to write the trailer to it and closes the old file. It then opens a new one and calls WRITE_PAGE_HEADER to write the page header to it. Note that, as the routine opens a new file (and hence creates a new file handle) the file parameter passed to it is a pointer to the file handle rather than the file handle itself. Note also that this routine depends on the value of config_ptr->curr_index_type when writing the page headers and trailers to indicate the home link.

FLUSH_DREC - Read all following 'D' records and first 'E' record (if any)

/************************************************************************/
/*									*/
/*  FLUSH_DREC - Read all 'D' records and 1st 'E' record		*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	status = FLUSH_DREC (inpfil_ptr, linecnt_ptr, note_ptr_ptr,	*/
/*			     cvartist_ptr_ptr, series_ptr_ptr,		*/
/*			     flags_ptr_ptr, nxtrec_ptr_ptr, prtfil_ptr)	*/
/*									*/
/*  Where:								*/
/*									*/
/*	status		 = Routine return status:			*/
/*			 = PSP_TRUE if everything OK			*/
/*			 = PSP_FALSE if we hit an error			*/
/*	inpfil_ptr	 = Pointer to input file			*/
/*	linecnt_ptr	 = Pointer to line count to update		*/
/*	note_ptr_ptr	 = Pointer to buffer to hold pointer to notes	*/
/*	cvartist_ptr_ptr = Pointer to buffer to hold pointer to artist	*/
/*	series_ptr_ptr	 = Pointer to buffer to hold pointer to series	*/
/*	flags_ptr_ptr	 = Pointer to buffer to hold pointer to flags	*/
/*	nxtrec_ptr_ptr	 = Pointer to buffer to hold pointer to record	*/
/*    	prtfil_ptr	 = Pointer to diagnostics file (may be NULL)	*/
/*									*/
/************************************************************************/

There are a number of places where we need to know what the contents of any following 'D' records are when we process an 'A' record. It is also important to concatenate all the text in the 'D' records into a single buffer before calling CVT_TXT_TO_HTML so that it can correctly check for mismatched commands like "Bold On". This routine is called whenever we encounter a new 'A' record to read through and parse all the following 'D' records. As it won't know when we've run out of 'D' records it also has to read the first record after the 'D' records and return that to the caller. Because of this we are also able to parse any front cover records specified on a 'cv' record (see below). Some notes to bear in mind are:

All the information being returned is built up in local static buffers, pointers to which are returned to the caller. As such, it is important that the returned information is either processed or stored elsewhere before the routine is called again.
As this routine is reading records from the input file it needs to update the line count to ensure that the anchors generated in BUILD_ISSUE_IDX match those stored in SETUP_ISSUE_IDX
The flags buffer is set up from any DQE records encountered and any partial data indicators (see below).
The series buffer is set up from any DS records encountered.
The cover artist buffer is set up either from a DC record or a record that has "E fc.A0" in the record type field and an item type of "cv". Note that this will not handle a cover artist 'E' record that has a folowing EB note. Note that the data returned is in the form Artist~Title~Title Additional~Publication Details~ where Publication Details does not include the item type.
Any 'E' record with a title of " Need Contents" is ignored, both because we don't want to display and because it might precede a cover artist record.
Any DP or DH records are translated in the same way as in CVTLOCUS.
Any DE, DN or DQx records are ignored.
All other D records are just concatenated into the buffer. In addition if any of them starts "Incomplete Data " then "PARTIAL" is added to the flags buffer.

FORMAT_AUTHOR - Format an Author Name as heading (with scanitem as input)

/************************************************************************/
/*									*/
/*  FORMAT_AUTHOR - Format an Author Name as heading			*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	outbuf_ptr = FORMAT_AUTHOR (config_ptr, scanitem_ptr,		*/
/*				    auth_flag, link_typ, got_items,	*/
/*				    prtfil_ptr)				*/
/*									*/
/*  Where:								*/
/*									*/
/*	outbuf_ptr   = Pointer to buffer containing formatted name	*/
/*		     = NULL if an error					*/
/*		     = "" if we don't want this one			*/
/*	config_ptr   = Pointer to CONFIG_DATA structure			*/
/*	scanitem_ptr = Pointer to SCANITEM structure			*/
/*	auth_flag    = FMTAUT_NEW if author name has changed		*/
/*		     = FMTAUT_CNT if we want a continuation name	*/
/*	link_typ     = Type of link required				*/
/*		     = LINK_UNKNOWN if unknown				*/
/*		     = LINK_STYAUT if story authors			*/
/*		     = LINK_BOKAUT if book authors			*/
/*		     = LINK_ARTIST if artists				*/
/*		     = LINK_CHRON if chronological index		*/
/*		     = LINK_BIOG if biographical notes			*/
/*		     = LINK_BIOG2 if biographical notes OR external	*/
/*	got_items    = PSP_TRUE if author has items in this index	*/
/*		     = PSP_FALSE otherwise				*/
/*     	prtfil_ptr   = Pointer to diagnostics file (may be NULL)	*/
/*									*/
/*  This formats a story author into the format used for the headers	*/
/*  in the listing file and for the middle-level index.			*/
/*									*/
/*  Note that the returned string has an embedded </A> to terminate	*/
/*  the <A HREF or <A NAME that will precede it.			*/
/*									*/
/************************************************************************/

This routine ...

FORMAT_AUTHOR2 - Format an Author Name as heading (with AUTH structure as input)

/************************************************************************/
/*									*/
/*  FORMAT_AUTHOR2 - Format an Author Name as heading			*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	outbuf_ptr = FORMAT_AUTHOR2 (config_ptr, auth_ptr, link_typ,	*/
/*				     prtfil_ptr)			*/
/*									*/
/*  Where:								*/
/*									*/
/*	outbuf_ptr   = Pointer to buffer containing formatted name	*/
/*		     = NULL if an error					*/
/*	config_ptr   = Pointer to CONFIG_DATA structure			*/
/*	auth_ptr     = Pointer to author name				*/
/*	link_typ     = Type of link required				*/
/*		     = LINK_UNKNOWN if unknown				*/
/*		     = LINK_STYAUT if story authors			*/
/*		     = LINK_BOKAUT if book authors			*/
/*		     = LINK_ARTIST if artists				*/
/*		     = LINK_CHRON if chronological index		*/
/*		     = LINK_BIOG if biographical notes			*/
/*		     = LINK_BIOG2 if biographical notes OR external	*/
/*      prtfil_ptr   = Pointer to diagnostics file (may be NULL)	*/
/*									*/
/*  This is a shell around FORMAT_AUTHOR that is called when we don't	*/
/*  have a scanitem to hand.						*/
/*									*/
/************************************************************************/

This routine simple sets up the Normalised Author field in a local scanitem structure from the passed parameter and then calls FORMAT_AUTHOR.

FORMAT_AUTHTYPE - Format an Author Type as subheading

/************************************************************************/
/*									*/
/*  FORMAT_AUTHTYPE - Format an Author Type as subheading		*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	output_ptr = FORMAT_AUTHTYPE (auth_type)			*/
/*									*/
/*  Where:								*/
/*									*/
/*	outbuf_ptr  = Pointer to formatted buffer			*/
/*	auth_type   = Author type					*/
/*									*/
/*  This formats a story author type into the format used for the	*/
/*  subheaders in the Story Author Listings File.			*/
/*									*/
/************************************************************************/

This routine checks to see if the SCANAUT_MAYBE flag is set and, if so, starts with ", [?]"; it then appends to the buffer a suffix such as ", after." that corresponds to the author type.

FORMAT_BOOK_DETAILS - Format a set of book details

/************************************************************************/
/*									*/
/*  FORMAT_BOOK_DETAILS - Format book details				*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	FORMAT_BOOK_DETAILS (outbuf_ptr, fld_ptr, bufsiz)		*/
/*									*/
/*  Where:								*/
/*									*/
/*	outbuf_ptr = Pointer to output buffer				*/
/*	fld_ptr    = Array of field pointers				*/
/*	bufsiz	   = Size of output buffer				*/
/*									*/
/************************************************************************/

This routine formats the contents of an 'A' record into the book detail format generated by FORMAT_PUBDET for a real abbreviation and is called from SETUP_BOKAUT_IDX for books without a real ID. It generates a string along the lines of:

Twelve Tales by Grant Allen, G. Richards, 1899

where the string has been made HTML-safe, the title and publisher have had any extraneous bits stripped off and the author/editor name(s) are formatted via BUILD_AUTH.

FORMAT_FULLTEXT_LINK - Format a Full Text link field

/************************************************************************/
/*									*/
/*  FORMAT_FULLTEXT_LINK - Format a Full Text link field		*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	buff_ptr = FORMAT_FULLTEXT_LINK (link_ptr)			*/
/*									*/
/*  Where:								*/
/*									*/
/*	buff_ptr = Returned pointer to formatted field			*/
/*	date_ptr = Pointer to link field				*/
/*									*/
/*  This routine formats the contents of a full text link field.	*/
/*									*/
/************************************************************************/

This routine ...

FORMAT_ISSUE_LINK - Format link to specified issue (if possible)

/************************************************************************/
/*									*/
/*  FORMAT_ISSUE_LINK - Format link to specified issue (if possible)	*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	link_ptr = FORMAT_ISSUE_LINK (config_ptr, pubdet_ptr,		*/
/*				  prtfil_ptr)				*/
/*									*/
/*  Where:								*/
/*									*/
/*	link_ptr   = Pointer to Issue Details with link			*/
/*	config_ptr = Pointer to CONFIG_DATA structure			*/
/*	pubdet_ptr = Issue details to find link for			*/
/*	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/************************************************************************/

This routine calls GET_PUBDET_LINK to see if the specified issue/book is in our index and, if so, to return the page and anchor references for it. It then calls FORMAT_PUBDET to format the publication details into the format we want and returns a pointer to an internal static buffer containing the formatted publication details, surrounded (if appropriate) by an A HREF link to the associated part of the index.

FORMAT_MAGPUB - Format Magazine Issue Details

/************************************************************************/
/*									*/
/*  FORMAT_MAGPUB - Format magazine issue details			*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	FORMAT_MAGPUB (pubdet_ptr, fld_ptr, bufsiz)			*/
/*									*/
/*  Where:								*/
/*									*/
/*	pubdet_ptr = Pointer to output buffer				*/
/*	fld_ptr    = Array of field pointers				*/
/*	bufsiz	   = Size of output buffer				*/
/*    	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/************************************************************************/

FORMAT_NAMES - Format a names field.

/************************************************************************/
/*									*/
/*  FORMAT_NAMES - Format names field					*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	buff_ptr = FORMAT_NAMES (config_ptr, names_ptr, link_typ,	*/
/*				 format_typ, newtxt_ptr, prtfil_ptr)	*/
/*									*/
/*  Where:								*/
/*									*/
/*	buff_ptr   = Returned pointer to formatted names		*/
/*		   = NULL if we had an error				*/
/*	config_ptr = Pointer to CONFIG_DATA structure			*/
/*	names_ptr  = Pointer to names field				*/
/*	link_typ   = Type of link required				*/
/*		   = LINK_UNKNOWN if unknown				*/
/*		   = LINK_STYAUT if story authors			*/
/*		   = LINK_BOKAUT if book authors			*/
/*		   = LINK_ARTIST if artists				*/
/*		   = LINK_CHRON if chronological index			*/
/*		   = LINK_BIOG if biographical notes			*/
/*		   = LINK_BIOG2 if biographical notes OR external flag	*/
/*	format_typ = FMTNAM_NORMAL if we want all names			*/
/*		   = FMTNAM_HOUSE if we only want names in the index	*/
/*		     and want dates added to each			*/
/*		   = FMTNAM_PSEUD if we only want names in the index	*/
/*	newtxt_ptr = Replacement text to display			*/
/*		  (this is only valid if there is only a single primary	*/
/*		   name and no secondary names)				*/
/*	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/************************************************************************/

This routine formats the contents of a names field, adding links to the associated section for the name. This is a fairly complex routine that attempts to parse a string of names into the format required by the indexes. It is (seriously) complicated because the basic mechanism is common to a wide range of differing circumstances that need careful handling. For example:

In general, all names are linked to one of the five name-based indexes (BOKAUT, STYAUT, ARTIST, CHRON & BIOG) but the latter ("About") links vary depending on whether we are linking from within the BIOG index (when all links should link locally) or from elsewhere (when external links link directly to the external site) so there is an additional type, BIOG2, to handle the latter case. There is also an UNKNOWN type which, in theory, tries each of the indexes until it finds a link - it's unclear whether or not this is useful.
In general, names are passed through as a standard name string in "internal format", but when we are parsing lists of pseudonyms or house names they are passed through as a string of authors separated by '/' characters. Technically this is the same as internal format, but in some cases there are a LOT of names in these strings which would mean extending the limits in AUTH_RTN which seems unnecessary, so we handle those by hand instead.
In general we want to display all specified names whether they are in our index or not, but this does not apply to pseudonyms and house names where we ONLY want the names that exist in our index. Note that there might be no such names in which case an empty string is returned. Note that pseudonyms and house names are treated in the same way EXCEPT that dates are appended to each name if FMTNAM_HOUSE is specified.
In general the (reverted) author's name is output within each hyperlink. However, if we have multiple authors with the same surname, the surname is suppressed on all but the last name. In addition, when handling embedded hyperlinks of the form "[%xxx]" there might be a replacement text specified which is used in place of the name. This is only valid if there is a single primary name and no secondary name(s).

A general complication is that we want to handle dividers neatly so that we have ", " separating all pairs of authors except the last pair which is separated by " & ", and we need to do this for the primary authors as a group, and for each set of (local or global) authors as a group. As it isn't easy to know when we've reached the last pair of authors (for primary authors because some might be suppressed; for secondary authors because there are multiple types held in the same structures) we handle this by inserting ", " dividers everywhere and remembering in each batch where we put them; then, at the end of the batch, we check to see if we had at least one divider and, if so, replace the last one with " & ".

In addition to the above, the routine is called in two different contexts. In general we have the whole author string to be formatted, but in STYAUT the author itself has already been output as the heading, so all we have to parse are the secondary names (which could be local or global). If we have both author name and secondary name then these are identical. Thus, for example:

	Maupassant, Guy de ,(tr:Wilson, Mary W.)

is translated as:

	Guy de Maupassant; translated by Mary W. Wilson

while:

	 ,(tr:Wilson, Mary W.)

is translated as:

	translated by Mary W. Wilson

However, complications arise if one of the names is specified as "Anon." or "[Unknown Author]" and we need to distinguish between the first case with a primary name of "Anon." and the second case where we don't have the primary name. For example:

	Anon. ,(by:Wilson, Mary W.)

is translated as:

	Mary W. Wilson, uncredited.

while:

	 ,(by:Wilson, Mary W.)

is translated as:

	(by Mary W. Wilson)

This is handled by the two flags got_anon which is set in the former case and no_primary_name which is set in the latter case. Note that got_anon is only set if there are some secondary names as otherwise we just output it as [uncredited].

If we only have 1 primary author, which is "Anon." or equivalent and it has no secondary authors AND there are some global secondary names then we suppress the primary author. This is so that constructs such as "Anon. ,(as told to:Smith, Fred" are displayed as just "as told to Fred Smith" (not if link_typ is LINK_BOKAUT).
If we have multiple authors where all the authors have the same surname we suppress the surname on all but the final author (not if link_typ is LINK_BOKAUT).
For all except LINK_BOKAUT, the authors are output in a single string separated by commas and an ampersand for the last pair, followed by any global secondary names; for LINK_BOKAUT the first primary author is left inverted, does not have a link added and is separated from the other authors with a special divider of ^||.
If LINK_UNKNOWN is specified the routine tries LINK_STYAUT first followed by LINK_ARTIST (not sure if this is needed or not).

FORMAT_NOTES - Format a Notes Field

/************************************************************************/
/*									*/
/*  FORMAT_NOTES - Format a notes field					*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	status = FORMAT_NOTES (config_ptr, outbuf_ptr, notes_ptr,	*/
/*			       outbuflen, link_typ, prtfil_ptr);	*/
/*									*/
/*  Where:								*/
/*									*/
/*	status	   = PSP_TRUE if notes formatted successfully		*/
/*		   = PSP_FALSE otherwise				*/
/*	config_ptr = Pointer to CONFIG_DATA structure			*/
/*	outbuf_ptr = Pointer to output buffer				*/
/*	notes_ptr  = Pointer to input buffer				*/
/*	outbuflen  = Size of output buffer				*/
/*	link_typ   = Type of link required				*/
/*		   = LINK_UNKNOWN if unknown				*/
/*		   = LINK_STYAUT if story authors			*/
/*		   = LINK_BOKAUT if book authors			*/
/*		   = LINK_ARTIST if artists				*/
/*		   = LINK_CHRON if chronological index			*/
/*		   = LINK_BIOG if biographical notes			*/
/*		   = LINK_BIOG2 if biographical notes OR external flag	*/
/*	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/*  This routine formats the contents of a notes field, translating	*/
/*  any [@ or [% hyperlinks.						*/
/*									*/
/*  Note that this routine APPENDS to the output buffer.		*/
/*									*/
/************************************************************************/

This routine ...

FORMAT_PUBDATE - Format Pub. Info. date field

/************************************************************************/
/*									*/
/*  FORMAT_PUBDATE - Format Pub. Info. date field			*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	buff_ptr = FORMAT_PUBDATE (date_ptr)				*/
/*									*/
/*  Where:								*/
/*									*/
/*	buff_ptr = Returned pointer to formatted date			*/
/*		 = NULL if we had an error				*/
/*	date_ptr = Pointer to date field				*/
/*	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/*  This routine formats the contents of a date field in the Pub. Info.	*/
/*  section. These are of the form [cc]yy/mm[/dd].			*/
/*									*/
/************************************************************************/

This routine ...

FORMAT_SERIES - Format series field

/************************************************************************/
/*									*/
/*  FORMAT_SERIES - Format series field					*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	buff_ptr = FORMAT_SERIES (config_ptr, series_ptr, include_flg,	*/
/*				   prtfil_ptr)				*/
/*									*/
/*  Where:								*/
/*									*/
/*	buff_ptr	= Returned pointer to formatted names		*/
/*			= NULL if we had an error			*/
/*			= "" if no matching series			*/
/*	config_ptr	= Pointer to CONFIG_DATA structure		*/
/*	series_ptr	= Pointer to series field			*/
/*	include_flg 	= PSP_TRUE if we want all series		*/
/*			= PSP_FALSE if we only want series in the index	*/
/*	prtfil_ptr	= Pointer to diagnostics file (may be NULL)	*/
/*									*/
/*  This routine formats the contents of a series field, adding links	*/
/*  to the associated section for the series.				*/
/*									*/
/************************************************************************/

FORMAT_URL - Format a URL specified in an FM data file

/************************************************************************/
/*									*/
/*  FORMAT_URL - Format a URL specified in an FM data file		*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	output_ptr = FORMAT_URL (url_ptr, text_flg, prtfil_ptr)		*/
/*									*/
/*  Where:								*/
/*									*/
/*	outbuf_ptr = Pointer to formatted buffer			*/
/*	url_ptr	   = Pointer to input URL				*/
/*	text_ptr   = PSP_FALSE if formatting the URL as hyperlink	*/
/*		   = PSP_TRUE if formatting the URL for display		*/
/*    	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/*  This formats a URL into the format required for use as a hypelink	*/
/*  or for display to the user.						*/
/*									*/
/************************************************************************/

This routine ...

GET_ANCHOR - Get the HTML to link to a specific anchor

/************************************************************************/
/*									*/
/*  GET_ANCHOR - Get the HTML to link to a specific anchor		*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	link_ptr = GET_ANCHOR (config_ptr, prefix_ptr, page_count,	*/
/*			       anchor_count)				*/
/*									*/
/*  Where:								*/
/*									*/
/*	link_ptr     = Pointer to HTML link				*/
/*	config_ptr   = Pointer to CONFIG_DATA structure			*/
/*	prefix_ptr   = Filename/folder prefix				*/
/*	page_count   = Page number					*/
/*	anchor_count = Anchor number					*/
/*		     = -1 if just page link wanted			*/
/*									*/
/************************************************************************/

Depending on the setting in the Index Configuration File all the files in an index might be in a single folder or in multiple subfolders. To avoid checking this all over the place (and to allow for future flexibility) all links are set up via this routine which simply constructs the relevant HTML link.

GET_ANCHOR2 - Get the HTML to link to a name index anchor

/************************************************************************/
/*									*/
/*  GET_ANCHOR2 - Get the HTML to link to a name index anchor		*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	link_ptr = GET_ANCHOR (config_ptr, link_typ, page_count,	*/
/*			       anchor_count)				*/
/*									*/
/*  Where:								*/
/*									*/
/*	link_ptr     = Pointer to HTML link				*/
/*	config_ptr   = Pointer to CONFIG_DATA structure			*/
/*	link_typ     = Type of link required				*/
/*		     = LINK_STYAUT if story authors			*/
/*		     = LINK_BOKAUT if book authors			*/
/*		     = LINK_ARTIST if artists				*/
/*		     = LINK_CHRON if chronological index		*/
/*		     = LINK_BIOG or LINK_BIOG2 if biographical notes	*/
/*	page_count   = Page number					*/
/*		     = -1 if just directory link wanted			*/
/*	anchor_count = Anchor number					*/
/*		     = -1 if just page link wanted			*/
/*									*/
/*  This is just a shell around GET_ANCHOR				*/
/*									*/
/************************************************************************/

This routine ...

GET_BOKTTL_INDEX - Get index in book title links table

/************************************************************************/
/*									*/
/*  GET_BOKTTL_INDEX - Find index in book title links table for a	*/
/*		       particular record number.			*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	index = GET_BOKTTL_INDEX (recnum);				*/
/*									*/
/*  Where:								*/
/*									*/
/*	index  = Index into book title links table			*/
/*	       = -1 if no link found					*/
/*	recnum = Record number to look for				*/
/*									*/
/************************************************************************/

This routine simply looks up the specified record number in the Book Title Index and returns the offset into the arrays (or -1 if the details can't be found).

GET_NAME_INDEX - Get index of specified name (if possible)

/************************************************************************/
/*									*/
/*  GET_NAME_INDEX - Get index of specified name (if possible)		*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	index = GET_NAME_INDEX (name_ptr, link_typ, prtfil_ptr)		*/
/*									*/
/*  Where:								*/
/*									*/
/*	index      = Index into tables for specified name		*/
/*		   = -1 if link not found				*/
/*	name_ptr   = Name to find link for				*/
/*	link_typ   = Type of link required				*/
/*		   = LINK_UNKNOWN if unknown				*/
/*		   = LINK_STYAUT if story authors			*/
/*		   = LINK_BOKAUT if book authors			*/
/*		   = LINK_ARTIST if artists				*/
/*		   = LINK_CHRON if chronological index			*/
/*		   = LINK_BIOG if biographical notes			*/
/*		   = LINK_BIOG2 if biographical notes OR external flag	*/
/*    	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/************************************************************************/

This routine ...

GET_NAME_LINK - Get link to specified name (if possible)

/************************************************************************/
/*									*/
/*  GET_NAME_LINK - Get link to specified name (if possible)		*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	GET_NAME_LINK (name_ptr, link_typ, pageno_ptr, anchor_ptr,	*/
/*		       prtfil_ptr)					*/
/*									*/
/*  Where:								*/
/*									*/
/*	name_ptr   = Name to find link for				*/
/*	link_typ   = Type of link required				*/
/*		   = LINK_UNKNOWN if unknown				*/
/*		   = LINK_STYAUT if story authors			*/
/*		   = LINK_BOKAUT if book authors			*/
/*		   = LINK_ARTIST if artists				*/
/*		   = LINK_CHRON if chronological index			*/
/*		   = LINK_BIOG if biographical notes			*/
/*		   = LINK_BIOG2 if biographical notes OR external flag	*/
/*	pageno_ptr = Pointer to int to hold page number			*/
/*		   = -1 if link not found				*/
/*		   = -2 if LINK_BIOG2 & we have external link		*/
/*	anchor_ptr = Pointer to int to hold anchor			*/
/*    	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/************************************************************************/

GET_PUBDET_INDEX - Get offset of link to specified item (if possible)

/************************************************************************/
/*									*/
/*  GET_PUBDET_INDEX - Find set of publication details in issue links	*/
/*			  table (if possible)				*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	index = GET_PUBDET_INDEX (pubdet_ptr)				*/
/*									*/
/*  Where:								*/
/*									*/
/*	index	   = Index into issue links table			*/
/*		   = -1 if no link found				*/
/*	pubdet_ptr = Publication details to look for			*/
/*									*/
/************************************************************************/

This routine simply looks up the specified publication details in the Issue Link Table (first converting it to the "old" format if necessary) and returns the offset into the arrays (or -1 if the details can't be found).

GET_SERIES_LINK - Get link to specified series (if possible)

/************************************************************************/
/*									*/
/*  GET_SERIES_LINK - Get link to specified series (if possible)	*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	GET_SERIES_LINK (series_ptr, pageno_ptr, anchor_ptr, prtfil_ptr)*/
/*									*/
/*  Where:								*/
/*									*/
/*	series_ptr = Series name to find link for			*/
/*	pageno_ptr = Pointer to int to hold page number			*/
/*		   = -1 if link not found				*/
/*	anchor_ptr = Pointer to int to hold anchor			*/
/*	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/************************************************************************/

READ_BOOK_FILE - Read book data file into consolidated book data file

/************************************************************************/
/*									*/
/*  READ_BOOK_FILE - Read book data file & add to consolidated file	*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	status = READ_BOOK_FILE (filnam_ptr, outfil_ptr, prtfil_ptr);	*/
/*									*/
/*  Where:								*/
/*									*/
/*	status	   = Result of operation:				*/
/*		   = PSP_TRUE if OK; else PSP_FALSE			*/
/*	filnam_ptr = Name of book data file				*/
/*	outfil_ptr = Consolidated data file to write to			*/
/*    	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/*  This routine opens the specified book data file and copies all the	*/
/*  the contents to the output file, adding a sort prefix.		*/
/*									*/
/************************************************************************/

While each individual book data file will probably be sorted into order (thanks to ASORT) we will be combining multiple book data files so we need to sort them all into a consistent order. This routine is called for each book data file and simply copies the contents to the consolidated data file with a sort prefix for each record consisting of the output from LOCUS_COMPACT for the most recent/current 'A' record followed by a six-digit record number (to keep items in order within a book and books with the same compacted form in the same order in the data file) followed by a second '~' divider, followed by the input record.

Note that this routine:

Ignores all records before the first "real" 'A' record
Ignores all blank records

The main complication is introduced by the use of DQN records to suppress an entire book. Early versions of the code left it up to the later code to handle this, but that led to page-sizing problems (as well as being inelegant). The next version tried to use fgetpos & fsetpos on the output file to reset it to just before the preceding 'A' record if we hit a DQN record, but that presented possible portability problems and, more to the point, didn't seem to work for obscure reasons. Rather than try to sort out the problem the next version got rid of fgetpos and fsetpos by using an array of buffers to read the records into:

read_first loops around until it has the first 'A' record that isn't the file header; the 'A' record is stored in recbuf[0]
read_drecs then loops around reading all following 'D' records into recbuf[n]; note that:
- If a "DQN" record is encountered then the suppress_book flag is set to say so;
- Otherwise all "DQ" records are ignored as we don't want them in the index itself;
- By definition we will always end up with the next record after the last 'D' record in the array as well;
process_recs is reached when we encounter the first non-D- record; we know we must have an 'A' record in recbuf[0] so it calls LOCUS_COMPACT to set up the sort header
do_next then outputs each record to the output file (as long as the suppress_book flag is not set), incrementing the record count for each and drops through to read_next which reads the next record either from the array or (if we have exhausted the array) from the file; when it find the next 'A' record it resets the relevant flags, makes sure the 'A' record is in recbuf[0] (if the 'A' record immediately followed the 'D' records for the previous book it would be at the end of the array) and branch back to read_drecs to handle the next book.

READ_MAG_FILES - Read all magazine files into a consolidated magazine data file

/************************************************************************/
/*									*/
/*  READ_MAG_FILES - Read magazine data files, cover scan file and	*/
/*		     full-text link file & create consolidated file	*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	status = READ_MAG_FILES (config_ptr, prtfil_ptr);		*/
/*									*/
/*  Where:								*/
/*									*/
/*	status	   = Result of operation:				*/
/*		   = PSP_TRUE if OK; else PSP_FALSE			*/
/*	config_ptr = Pointer to CONFIG_DATA structure			*/
/*    	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/************************************************************************/

This routine sets up sorted lists of cover scan, full text and about links and then reads through the magazine data files creating a consolidated data file with the relevant links added. Note that these are added as "extra" fields (FLD_IMGLNK & FLD_TXTLNK) at the end of the 'A' record. In an ideal world we could just read the files into memory and access them when needed (as we do for PSEUD.CVT and such-like) but I'm getting paranoid about memory usage and we only need the links when processing the 'A' records and can then free up the allocated memory. It also means we only have to deal with two input files in SETUP_ISSUE_IDX and BUILD_ISSUE_IDX rather than reading through all the magazine files each time.

Note that we check the cover scans first because there are many more of them (130,000 vs. 9,000) and, when adding the full-text links, we do a binary chop on the cover links part of what we've set up to save time (we initially did a sequential search and this module took 15 seconds or more). The about links are associated only with (main) feature records so are just added as separate entries. Note that the file containing the about links is not in a standard folder and hence is specified via the Index Configuration File.

REPORT_DIAGS - Report Extended Diagnostics (if required)

/************************************************************************/
/*									*/
/*  REPORT_DIAGS - Report Extended Diagnostics (if required)		*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	REPORT_DIAGS (config_ptr, report_type);				*/
/*									*/
/*  Where:								*/
/*	config_ptr  = Pointer to CONFIG_DATA structure			*/
/*	report_type = Type of data to report:				*/
/*		    = DMP_MAGLIST for Magazine File List		*/
/*		    = DMP_CONFIG for Configuration Data			*/
/*		    = IDX_ARTIST for Artist Index			*/
/*		    = IDX_BIOG for Biographical Notes			*/
/*		    = IDX_BOKAUT for Book Author Index			*/
/*		    = IDX_CHRON for Chronological Index			*/
/*		    = IDX_ISSUE for Issue Index				*/
/*		    = IDX_SERIES for Series Index			*/
/*		    = IDX_STYAUT for Story Author Index			*/
/*		    = IDX_FULLTXT for Full Text Index			*/
/*									*/
/************************************************************************/

This routine simpy checks to see if extended diagnostics are required and, if so, dumps out the array(s) associated with the report type specified.

SPLIT_TITLE - Split title field into constituent parts

/************************************************************************/
/*									*/
/*  SPLIT_TITLE - Split title field into constituent parts		*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	SPLIT_TITLE (title_ptr, titlad_ptr, itemad_ptr, colhdr_ptr_ptr,	*/
/*		     divider_ptr_ptr, title_ptr_ptr)			*/
/*									*/
/*  Where:								*/
/*									*/
/*	buff_ptr	= Returned pointer to formatted title		*/
/*	title_ptr	= Pointer to title field			*/
/*	titlad_ptr	= Pointer to title additional field		*/
/*	itemad_ptr	= Pointer to item additional field		*/
/*	colhdr_ptr_ptr	= Pointer to hold pointer to column title	*/
/*	divider_ptr_ptr = Pointer to hold pointer to divider		*/
/*	titl_ptr_ptr	= Pointer to hold pointer to item title		*/
/*									*/
/*  This routine combines the title and item additional fields with	*/
/*  the title, intelligently handling sort titles, series dividers	*/
/*  and story numbers, and returns pointers to the three elements of	*/
/*  the title - the column/series name, the string to use as a divider	*/
/*  and	the item title (including any series sequence). The first	*/
/*  two fields are returned as an empty string if not relevant.		*/
/*									*/
/************************************************************************/

This routine ...

TIDY_DNOTES - Tidy up contents of any 'D' Notes

/************************************************************************/
/*									*/
/*  TIDY_DNOTES - Tidy up contents of any 'D' Notes			*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	outbuf_ptr = TIDY_DNOTES (config_ptr, inpbuf_ptr, prtfil_ptr);	*/
/*									*/
/*  Where:								*/
/*									*/
/*	outbuf_ptr = Pointer to buffer with tidied notes.		*/
/*	config_ptr = Pointer to CONFIG_DATA structure			*/
/*	inpbuf_ptr = Pointer to input buffer				*/
/*	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/*  This routine checks to see if the 'D' notes need any special	*/
/*  formatting (e.g. "--- see under xxx).				*/
/*									*/
/************************************************************************/

This routine first checks to see if the notes start with the special prefix "--- see under " and, if so, check the next character to see what sort of link is required:

A '{' indicates a magazine (issue) link and can be specified as either publication details or text (i.e. a magazine name).
A '<' indicates a book line which can only be specified as publication details.
A '(' indicates an author name which must be in external format.
Anything else is illegal.

Publication details and magazine names are converted via FORMAT_ISSUE_LINK (with a ':' prefixed in the latter case); names are converted to internal format via TRANSLATE_AUTH and then converted via FORMAT_NAMES. Magazine links are output in italics; book links in bold and author links normally.

If the notes do not start with the special prefix then the routine simply calls the shared FORMAT_NOTES routine.

WRITE_INDEX_LINE - Write a line to the intermediate index(es)

/************************************************************************/
/*									*/
/*  WRITE_INDEX_LINE - Write a line to the intermediate index(es)	*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	status = WRITE_INDEX_LINE (config_ptr, reqtype, text_ptr,	*/
/*				   prtfil_ptr);				*/
/*									*/
/*  Where:								*/
/*									*/
/*	status	    = Result of operation:				*/
/*		    = PSP_TRUE if OK; else PSP_FALSE			*/
/*	config_ptr  = Pointer to CONFIG_DATA structure			*/
/*	reqtype	    = Request Type:					*/
/*		    = IDXLIN_NORMAL for normal index line		*/
/*		    = IDXLIN_CONT for continuation line			*/
/*		    = IDXLIN_SPECIAL for special index lines		*/
/*		    = IDXLIN_TRAIL for trailers				*/
/*	text_ptr    = text to use for continuation or for special	*/
/*		      (if IDXLIN_CONT or IDXLIN_SPECIAL)		*/
/*		    = text to use for continuation otherwise		*/
/*    	prtfil_ptr  = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/*  This routine outputs a line to the Level 1 Index, throwing a new	*/
/*  page and adding a line to the Level 2 Index, etc., if necessary.	*/
/*									*/
/************************************************************************/

This routine is called when a line is to be output to the intermediate (Level 1 to 3) Indexes and implements the 3-level index structure described in the Basic Index Structure - see the section on using WRITE_INDEX_LINE for an outline description of how to use the routine.

It outputs the relevant line to the Level 1 Index; if necessary it also throws a new page and adds a line to the Level 2 Index and, if also necessary, throws a new page in the Level 2 Index and adds a line to the Level 3 Index. In addition to the formal parameters, it relies on the following fields in config_ptr:

standard items defining the index:
- curr_index_type - current index type being processed
- minpagesize - minimum number of lines in a page
- subfolders - determines if files should be generated in sub-folders
- idxdir - folder to generate indexes into
information for formatting the Level 1 Index (not needed for IDXLIN_SPECIAL)
- formatted_name - formatted name of current item (i.e. format to use in Level 1 Index; not needed for IDXLIN_SPECIAL)
- lstpage_count - current page number in Listings Level
- lstanchor_count - current anchor number in Listings Level (not needed for IDXLIN_SPECIAL)
information needed for Level 2 & 3 Indexes
- curr_name - name of current item

It operates in one of four modes depending on the contents of the reqtype parameter. By default this is set to IDXLIN_NORMAL indicating that a normal index line is to be output. If so:

the routine first checks a static flag to see if this is the first time and, if so:
- it opens the "top level" output file for this index (as defined by the static topfilnam_ptr table) and writes a standard page header to it via WRITE_PAGE_HEADER
- it then works out how many index levels we need by looking at the number of lines in the level 1 index (idxlinecnt) and the minimum number of lines in a page (minpagesize); idxlinecnt is basically a count of each new item (e.g. author) needing an index entry plus each time a continuation page is thrown; the calculation is biassed in favour of fewer levels to try to avoid ever having a top page with a single line in it.
- if only one index level is needed, it sets the top level output file as the level 1 index file; if two levels are needed it sets it as the level 2 index file and sets the level 1 line count to force an immediate new page; if three levels are needed it sets it as the level 3 index file and sets both level 1 and level 2 line counts to force an immediate new page.
- it also works out whether names in the indexes should be output in bold or italics.
the routine then counts another line in the level 1 index and checks to see if we have at least 2 levels of index and if the level 1 line count now exceeds the maximum for a page (always true first time round) and, if so:
- increments the level 1 page count and, if this isn't the first page:
  - writes a page trailer to the current level 1 index file
  - writes a line to the level 2 index file describing the current level 1 page
- calls CREATE_NEW_PAGE to close the current level 1 index file and open a new one
- writes a header to the new level 1 index file
- saves the current name (curr_name) as the first item for the next level 2 index line
- increments the level 2 line count
- if we have 3 levels of index and if the level 2 line count now exceeds the maximum for a page (always true first time round) repeat the whole process for the level 2/3 indexes
saves the current name (curr_name) as the last item (so far) for the next level 2 & level 3 index lines
formats and outputs the line for the level 1 index

If reqtype is set to IDXLIN_SPECIAL the routine continues exactly as above except that the line for the level 1 index is passed in the text_ptr parameter rather than being set up automatically (this is used for the Story and Book Title Indexes)

If reqtype is set to IDXLIN_CONT then we have a continuation line. In this case we still count a line in the level 1 index and output the continuation line, but do not throw any new pages.

If reqtype is set to IDXLIN_TRAIL then we need to clean up for this index and reset for the next index:

the routine writes the trailers to the Level 1 Index and closes the associated file;
if there are at least 2 levels for this index, it then writes a final line to the Level 2 Index, followed by the trailers, and closes the associated file;
if there are 3 levels for this index, it then repeats the process for the Level 3 Index
it then updates the global page count (pages_cnt) with the number of pages used in this index and resets the "first time" flag for next time round.

WRITE_PAGE_HEADER - Write a standard header to a file

/************************************************************************/
/*									*/
/*  WRITE_PAGE_HEADER - Write a standard header to the specified file	*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	status = WRITE_PAGE_HEADER (config_ptr, outfil_ptr, pagttl_ptr,	*/
/*				    prvlnk_ptr, homlnk_ptr, prtfil_ptr);*/
/*									*/
/*  Where:								*/
/*									*/
/*	status	   = Result of operation:				*/
/*		   = PSP_TRUE if OK; else PSP_FALSE			*/
/*	config_ptr = Pointer to CONFIG_DATA structure			*/
/*	outfil_ptr = File to write header to				*/
/*	pagttl_ptr = Title of page					*/
/*	prvlnk_ptr = Previous link (if NULL, none created)		*/
/*	homlnk_ptr = Index home link (if NULL, none created)		*/
/*	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/************************************************************************/

This routine simply reads through the Page Header boilerplate file (index_hdr.htm), making the necessary substitutions and outputting the result to the specified file. Note that, for performance reasons, the whole boilerplate file is stored in memory when it is first read to avoid unnecessary file I/O.

WRITE_PAGE_TRAILER - Write a standard trailer to a file

/************************************************************************/
/*									*/
/*  WRITE_PAGE_TRAILER - Write a standard trailer to the specified file	*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	status = WRITE_PAGE_TRAILER (config_ptr, outfil_ptr,		*/
/*				     nxtlnk_ptr, homlnk_ptr,		*/
/*				     prtfil_ptr);			*/
/*									*/
/*	Where:								*/
/*									*/
/*	status	   = Result of operation:				*/
/*		   = PSP_TRUE if OK; else PSP_FALSE			*/
/*	config_ptr = Pointer to CONFIG_DATA structure			*/
/*	outfil_ptr = File to write header to				*/
/*	nxtlnk_ptr = Next link (if NULL, none created)			*/
/*	homlnk_ptr = Index home link (if NULL, none created)		*/
/*	prtfil_ptr = Pointer to diagnostics file (may be NULL)		*/
/*									*/
/************************************************************************/

This routine simply reads through the Page Trailer boilerplate file (index_trl.htm), making the necessary substitutions and outputting the result to the specified file. Note that, for performance reasons, the whole boilerplate file is stored in memory when it is first read to avoid unnecessary file I/O.

WRITE_PUB_INFO - Format and output all "Pub Info" records

/************************************************************************/
/*									*/
/*  WRITE_PUB_INFO - Format and output all "Pub Info" records		*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	status = WRITE_PUB_INFO (config_ptr, inpfil_ptr, outfil_ptr,	*/
/*				 nxtrec_ptr_ptr, prtfil_ptr)		*/
/*									*/
/*  Where:								*/
/*									*/
/*	status	       = Result of operation:				*/
/*		       = PSP_TRUE if OK; else PSP_FALSE			*/
/*	config_ptr     = Pointer to CONFIG_DATA structure		*/
/*	inpfil_ptr     = Pointer to input file				*/
/*	outfil_ptr     = File to write header to			*/
/*	nxtrec_ptr_ptr = Pointer to buffer to hold pointer to record	*/
/*	prtfil_ptr     = Pointer to diagnostics file (may be NULL)	*/
/*									*/
/************************************************************************/

This routine ...

WRITE_STORY_ITEM - Write a single story item group

/************************************************************************/
/*									*/
/*  WRITE_STORY_ITEM - Write a single story item group			*/
/*									*/
/*  Calling Format:							*/
/*									*/
/*	status = WRITE_STORY_ITEM (config_ptr, outfil_ptr, fldbuf_ptr,	*/
/*				   fldbufsiz, scanitem_ptr, newpage_flg,*/
/*				   prtfil_ptr)				*/
/*									*/
/*	Where:								*/
/*									*/
/*	status	     = Result of operation:				*/
/*		     = PSP_TRUE if OK; else PSP_FALSE			*/
/*	config_ptr   = Pointer to CONFIG_DATA structure			*/
/*	outfil_ptr   = File to write header to				*/
/*	fldbuf_ptr   = Pointer to buffer for current record		*/
/*	fldbufsize   = Size of buffer for current record		*/
/*	scanitem_ptr = Pointer to SCANITEM structure for current record	*/
/*	newpage_flg  = PSP_TRUE if we've just thrown a new page		*/
/*		     = PSP_FALSE otherwise				*/
/*    	prtfil_ptr   = Pointer to diagnostics file (may be NULL)	*/
/*									*/
/************************************************************************/

This routine formats and outputs all records which have the same author, author type, title, coauthors, secondary names and ED notes. While this seems fairly simple, it has turned out to be fiendishly complex for a number of reasons, primarily to do with aggregation. For clarity, the principles behind aggregation (and some of the past problems encountered) are discussed in the main IDXGEN documentation. It should really be part of BUILD_STYAUT_IDX (which is the only routine that calls it) but has been split out to try to make the code more manageable. Note that this routine reads most of the records while BUILD_STYAUT_IDX handles all the page throws and it is critical to ensure that the two, between them, match the code in SETUP_STYAUT_IDX as far as bookmarks and page numbers are concerned.

One complication this approach leads to is that WRITE_STORY_ITEM needs some knowledge of what the previous item looked like so that it can decide what fields do/do not need to be output, so there are a handful of pieces of static data that retained across invocations:

item_titl: the title of the previous item: this is used where we have a column/series with an item title that follows one of more instances of the same column/series without an item title so that we can list them as a single indented group
prev_colhdr: if we are doing a column/series, this is used to remember the prefix so that it only gets output once
done_indent: this tracks our level of indentation so we know when we do/don't need to do more/less indentation from item to item. This isn't entirely clear and seems to fail in some cases so needs further investigation.
prev_bookdetails: this is a short-term bodge to handle the fact that books aren't currently aggregated in the same way as items so we have to do it by hand across multiple invocations of the routine

Part 1: Initial Setup

The routine first initialises some useful fields:

if the caller indicates we have just thrown a new page then the above three fields are all reset as we need to redisplay any column/ book header if we have thrown a page (this is the only reason for the newpage_flg parameter)
reprintbuf is initialised to say that, by default, any book is not a reprint
secondary_auth is set to indicate if the item is for a secondary author (e.g. an editor, translator or similar) based on the setting of scanitem_ptr->authtype; we use this to decide if we want to output any series names and if we need to add them to an entry as discussed below
in addition doing_subject is set if the item is a subject author based on the setting of scanitem_ptr->authtype
item_type is set up as a short-hand for the first two characters of scanitem_ptr->pubdet_ptr so that it can be compared/copied more easily.

It then sets up the story and column titles, and outputs the latter if needed:

It sets up the formatted Story Title and Column Headers. Note that if we are doing a subject we format both into a single title as there won't be multiple items in the same column/series.
It sets up the Story Title Link
If we have a column header we check to see if it is in the "special column headers" array (i.e. defined in SERIES.CVT with a code of 'Z'). If so we don't want to list this instance so we set a flag (suppress_item) to say so. Note that we still need to consolidate any related items so we can't just exit.
Otherwise, if we have a column header and it is not the same as the previous one, it formats and outputs the appropriate HTML and remembers the column header in prev_colhdr

It then sets up the Item Header (itmhdrbuf) that we need whether we have one item or a thousand:

It starts with an asterisk hyperlinked to the relevant entry in the Index by Title (if we're doing something in a column the asterisk is followed by "___" to indent it).
It then adds the Story Title as formatted above. Note that:
- If the current item is a book, the routine calls GET_PUBDET_INDEX to find a matching entry in the Issue Link Table and, if found, brackets the title with a link to the associated entry. It also calls GET_BOKTTL_INDEX to set up bokttl_sub to point to the entry in the Book Title Link Table (if any).
- If the current item is a book, or an item with a book item type, the title is output in bold.
- If the current item type is "en" then the above is ignored and the magazine is stored, hyperlinked to the relevant part of the Issue Index
If we have some series information it adds that (note that we skip this for secondary authors)
We then set up any bylines and co-authors. Note that, if we're not doing a subject the byline is output before the co-authors. Initially for subjects we ignored the co-authors but this didn't work too well for book reviews (particularly when different versions of the same review cited different co-authors so that the listings were not aggregated but there was no visible reason why) so co-authors are output for book reviews, but in this case before the byline (i.e. before the author of the review). Co-authors don't look so good for other subject entries (e.g. when an article is about multiple authors) so the co-authors are still supressed in that case.
We then set up the subject for the item (if any) as long as we're not already doing a subject. We store this in a separate buffer (subjbuf) as it is output after the item details rather than before it.
If the item type is "mg" (i.e. we're listing the editor of a magazine issue), we then just add a ":"; otherwise, if the item type isn't "en" or "el" we add the item type in brackets.
If we have book title link with some associated publication details, and we're not doing a reprint that has a previous publisher listed (in scanitem_ptr->prvpub_ptr) then we add the book publication details.
If there are any secondary names (scanitem_ptr->secnam_ptr) it calls FORMAT_NAMES to format them. This returns a string containing "local" secondary names followed by "global" secondary names, separated by "||". If there are any of the former then they are added to the item header while a pointer (gblsecnam_ptr) is set up to point to the latter.
Lastly, it adds any notes (scanitem_ptr->notes_ptr), stripping off any trailing ']' character.

Note that the above approach for bylines potentially causes problems with book reviews (see the discussion under Notes on Some (Past) Problem Areas) as we might want to change the byline in the item header within a single set of results. We bodge this by saving the offsets in the itemheader before and after we add the byline so that we can reset it when needed.

Part 2: Consolidate Items

We're now ready to try to consolidate all the items that we wish to display as a group. The specific check at the moment is for items which:

are by the same author (scanitem_ptr->auth_ptr) with the same author type (scanitem_ptr->authtype)
have the same title (scanitem_ptr->ttad_ptr + scanitem_ptr->titl_ptr)
have the same co-authors (scanitem_ptr->coauth_ptr)
have the same secondary names (scanitem_ptr->secnam_ptr)
have the same subject (scanitem_ptr->subject_ptr)
have the same ED notes (scanitem_ptr->ednote_ptr)
have the same item type (1st 2 characters of scanitem_ptr->pubdet_ptr)
are an item or an item reprinted as a book (i.e. scanitem_ptr->book_type !+ '1')
either do not have an original title (scanitem_ptr->origttl_ptr) or have the same original title, title additional and publication details as the previous item
are not a piece of anonymous artwork (there are simply too many instances and it's not something anybody's likely to be interested in)

Note that this is (and should be) checking the same fields as were used to set up the Item Header except for:

checking they have the same series name (XVALIDATE ensures this must always be true)
checking they have the same byline (not sure what's happening here)
checking they have the same subject (not sure what's happening here)
handling of books (currently bodged)

As discussed in the section on aggregation we can't handle these linearly so we need to read them all and store the details in a series of interlinked structures that we can then sort, scan and manipulate as required. These are split into separate structures to try to minimise memory usage – some groups have large numbers of simple items (there are over 1200 instances of John North's "Where to Go and How to Get There" column) while other have a small number of complex items (e.g. where the title or byline varies from instance to instance and we don't want the memory overhead of allowing for the latter for all instances of the former.

At the top level we have the Group Array (group_str). Each instance of this identifies all occurrences under a single byline in a given file because we want to aggregate items in the same (magazine file) under the same byline (e.g. a repeating column or a serial). Each item in this array contains:

filabb: the file abbreviation (scanitem_ptr->filabb_ptr): as discussed this allows us to aggregate all items in the same magazine even if the magazine abbreviation changes; however it works less well for book files where we don't want to aggregate items so a separate group is created for each instance in a book file. This can lead to a rather large number of groups (e.g. for "Anon.~About the Author").
edition: the edition (scanitem_ptr->edition)
- for items: 1=same as book/mag ID; 2=reprint
- for books: standard edition code from Book Record
- X=dummy record for PSEUD.CVT entries
book_type: the book_type (scanitem_ptr->book_type) (1 = Book; 2 = Item; 3 = Item Reprinted as Book)
pbdt_sub: an array of subscripts into the Item Array
pbdt_cnt: count of the items in the group
xtrsub: subscript into the Group Extras Array if necessary
parent_sub: this is used to link groups together while the groups are being shuffled but thereafter is either set to -1for an original group or a reprint group for which no parent could be found, or -2 for a reprint group with a parent.

Most groups are quite simple but, in some cases, additional information is needed. As there might be hundreds of groups we try to save memory by having a separate array of these, the Group Extras Array (group_xtr), which is used only when necessary. Each entry contains:

byline: the byline on the item (scanitem_ptr->byline_ptr)
origtitl: the original title if there was one (scanitem_ptr->origtad_ptr + scanitem_ptr->origttl_ptr)
pubdet: the original publication details (scanitem_ptr->pubdet_ptr) if there was an original title
origpub: the original publication details (scanitem_ptr->prvpub_ptr) if we are dealing with a book
bokttl_sub: link into the book title array if we are dealing with a book

Note that the contents of this array are used in three different ways so we need to be careful with them:

If the only field that's needed is the byline then we can reuse the entry for all matching groups as the byline is simply output after the list of items.
However, if the original title is different then we only want to aggregate items that are identical - i.e. have both the same original title and the same publication details.
If we handling reprint books then we can never aggregate them. (Note that we always need one of these for a reprint book even if there are no "original publication details" to cope with books where the (original) edition was '2' (i.e. simultaneous first edition).

Below these lies the Item Array (item_str): we have one of these for each distinct item (as of 2024, one group needed almost 1500 instances). Each contains:

pubdet: the original publication details (scanitem_ptr->pubdet_ptr)
done_original: flag to say if we have done the original appearance (0=No/Not Needed; 1=Yes & this is it; 2=Yes) (scanitem_ptr->done_original)
magid: the current publication details (scanitem_ptr->magid_ptr)
cvtmagid: the current publication details in "old format" (scanitem_ptr->magid_ptr converted if it starts with '|' or copied otherwise)
pubyr: year of publication (first 4 digits of scanitem_ptr->dtpubl_ptr)
repsub: subscript into the Item Extras Array if necessary

Item Extras Array (item_rep): As with the Group Extras Array this is hived off to save space and contains, where relevant:

reptitl: the reprint title (scanitem_ptr->repttl_ptr)
repauth: the reprint author (scanitem_ptr->repaut_ptr)

The routine first sets up the entries for the first item (i.e. the one in scanitem_ptr when the routine was first called) – in the vast majority of cases this will be the only item in the set. It then reads the next item and checks to see if it is the same (as discussed above). If so then it aggregates the item with the previous one(s). It first checks to see if it can be added to an existing group, i.e.:

the byline matches that on the group (or neither has a byline)
the file abbreviation exists and is the same (the former check is because book files don't have a file abbreviation and we don't want to aggregate all items in book files together; this needs changing as it currently means a new group for each instance of an item in a book file and for some items there are over 100 of these)
the edition is the same

If not, it creates a new group. It then creates an entry in the Item Array (and possibly Item Extras Array) and links it to the matched or new group. It then goes back to read the next item and repeats the above.

Part 3: Tidy Groups

Once we have all the arrays set up, we need to tidy them up and sort them into the order necessary.

First we look at each reprint group (edition=2 or book_type=3) and try to find a prior group with an item with matching publication details. Generally there will always be one such because the code in SETUP_STYAUT_IDX always tries to create one, and the presence of the edition in the sort order will sort it before the reprints (and hence the group will be set up earlier). If it finds a match then it saves the Group ID as the Parent ID.

It then runs through the groups again, looking for original groups (edition=1 or book_type=1) or an orphaned reprint (i.e. parent_sub = -1) adding each to the ordered list of groups. For each group it finds it then checks the remaining groups for which it is parent and adds them next before moving on. Thus, for a complex set-up we should end up with:

First original group (this will be the one with the earliest publication date)
- First group with reprints from this group
- Second group with reprints from this group
- etc.
Second original group
- First group with reprints from this group
- etc.
etc.

Part 4: Display Items

Background

We now need to work out how to display the information we have. In the simplest case of a standalone, original, item (i.e. 1 group, 1 item) we have a single line of the form:

<itmhdrbuf>publication details<subjbuf>, e.g.
* The House of the Four Winds, (br) Interzone #254, September 2014 [Ref. Mercedes R. Lackey]

A slightly more complicated case is when we have a single item that has been reprinted, and where both appearances are in the index, in which case we have:

<itmhdrbuf>original publication details<subjbuf>
     reprint publication details, e.g.
* Cornel Wilde, (bg) Ranch Romances 1st November 1952 [Ref. Cornel Wilde]
   • Ranch Romances (UK) #4, 1953

Note that, in this case we would have two groups each with a single item. In general this will be the case whether or not the original item is in the database because of the fake originals we create in SETUP_STYAUT_IDX as discussed here. However, there are occasions when we do not create fake originals (e.g. when the original title was different) and we have to handle these cases as well, as discussed below.

The real "fun" with aggregation, though, comes when we have multiple instances of the same item (e.g. columns or similar) which may, in turn, be reprinted so we end up with something like:

* Ranch Flicker Talk, (mr) Ranch Romances 1st Mar, 2nd Mar 1952
   • Ranch Romances (UK) #1, #2 1952

where the magazine name is only output once in each instance, the year is only output when it changes and the dates are abbreviated, all in an attempt to save space. Note that the whole of "Ranch Romances 1st Mar" is hyperlinked to the first issue and "2nd Mar" to the second issue, and so on. The year is displayed in bold to help it stand out from the surrounding text. This formatting is all handled by special format types passed through to FORMAT_ISSUE_LINK and the code has to work out which format to use based on the surrounding items, as discussed below.

All of this is further complicated if the reprints appear under different titles or bylines from the original and/or if the items first appeared under a pseudonym and are now being displayed in the entry for the real author. These are handled by the Item Extras Array **More Detail Needed**

Partly for historical reasons, the rest of the code in this section falls into two halves:

The first part formats and outputs the original publication details (pubdet) from the item array
The second part formats and outputs the reprint publication details (magid) from the item array

If the item is original then it just does the first part; if it is a reprint for which we have done the original then it just does the second bit; if it is a reprint for which we have not done the original then it does both (with a bit of fancy footwork in the middle). This can be determined for items by looking at the parent_sub and edition flags in the group array:

parent_sub = -1 and edition = 1: this is an original item so we just do the first bit
parent_sub = -2 and edition != 1; this is a reprint for which we have done the original so we just do the second bit
parent_sub = -1 and edition != 1; this is a reprint for which we have NOT done the original so we do both

Books are currently handled differently.

Some Key Variables

need_header determines whether or not the item header in itmhdrbuf needs to be output. This is initially set to PSP_TRUE and reset to PSP_FALSE when we have output the header. Within a group it is also reset to PSP_TRUE if we have a group that represents a first edition or an orphaned reprint - e.g. if we have an unconnected set of items that share the same author/title/type etc. (such as a column that appears in multiple magazines) we want to output the item header before each distinct group of items. (Anon.~About the Author~bg is the most extreme example).

need_maghdr determine whether or not we need the magazine name displayed. This is always true for the first item in a batch, but is suppressed if we have multiple issues from the same magazine (see the discussion of aggregation above).

doing_issues is primarily used to flag when we have output some issues and hence need to close off the <LI> construct. It is set to PSP_TRUE when we output the item header (as that must be followed by issues of some kind). It is then checked at the end of each group and, if set, the code outputs the </LI> and additional trailer material (see below) and sets the flag back to PSP_FALSE. It is also involved in handling cases we we want to output both original and reprint information for a single item as we flush the original details and set the flag to PSP_FALSE and then use that to spot that we need to start a new list for the reprint details.

doing_indent is used to track if we have an indented list of reprints. There might be multiple lines of these so we need to know both that we're already started the indent so that we don't keep doing it and that we've done one at all so that we can switch it off at the end of the group.

Group Setup

The code loops through each group and first sets up bylinebuf to contain any original title and/or byline (the latter if we haven't already done it - clarify). If also checks to see if this is a first appearance (edition=="1") or an orphan group, either of which indicates the start of a new "batch", and, if so, sets the need_header flag. In all cases it also sets the need_maghdr flag (we always need the magazine name displayed at the start of a new group).

Item Setup

It then loops through each item in the group.

It first checks to see if there is another item following (setting itm2_sub to point to it if so) and, if so, if it is in the same group as the current one (setting same_group accordingly). It also increments the count for the different types of item if this is an original appearance or a reprint for which we haven't done the original (we need to do it somewhere so might as well do it here).

It then sorts out how much we want to display of the next set of details. By default we set the format type to FPDT_IDXGEN (i.e. the normal display) but:

If there is a next item, in the same group and for the same year then we reset it to either FPDT_IDXGN1 or FPDT_IDXGN2 depending on the setting of need_maghdr (the former includes the magazine name; the latter doesn't; both abbreviate the month and suppress the year) and set do_year to false (as the next item is for the same year)
Else, if there are multiple items in this group, we still set the format to either FPDT_IDXGN1 or FPDT_IDXGN2 as above but set do_year to true (as there is either no next item, or it is for a different year)
As a special bodge, if we set it to FPDT_IDXGN2 but there is only the year in the publication details, then we set it to FPDT_IDXGN3 which includes the year (as, otherwise, there is nothing to link with) and set do_year to false again.

Lastly, if the need_header flag is set, the routine outputs the contents of itmhdrbuf, possibly adjusting it to change the byline (expand) and reset the need_header flag. It also sets the doing_issues flag.

Display the Original Details

As mentioned above, this is basically done if the item is an original appearance or a reprint for which the done_original flag isn't set. Specifically it checks that:

It's not the same as the last item we did (when we create a fake original this will often duplicate an item already in the database and we don't want to list it twice)
The done_original flag is not set to "2" (i.e. it's either an original, a fake original, or a reprint for which we didn't create a fake original)
The item type isn't either "en" or "el" (why?)
The group edition is "1", there is only one item in the group, it's a book, or the need_maghdr flag is set (why?)

It then formats the link in pubdet (bit of a bodge here that needs clarifying) and adds it to issuebuf followed by the year if the do_year flag is set.

Reset Issue Buffer

If we are outputting both the original and reprint details for a single item then we need to flush the set we just set up and set up the groundwork for the next bit. At the moment we check the doing_issues flag which will be set if we've done the original printing and we check we're the first item in the group. The former is done rather than checking for parent_sub = -1 and edition != 1 to cope with items reprinted as books. We need to check we're the first item in the group as we might have multiple items (e.g. parts of a serial) all reprinting from the same original.

All the code does here is:

Add the subject if we have one and reset the need_subject flag; this ensures the subject(s) are displayed with the original data rather than the reprint data. Note that we can't delete the subject as we will need it for any other original groups for this author/item (e.g. multiple reviews of the same book by the same author in different venues)
Add the byline if we have one and reset bylinebuf for similar reasons, although in this case we won't need it again as the byline is specific to each group.
Terminate the list item unless we're doing a column (why?)
Output the contents of issuebuf, reset it and reset the need_issues flag to PSP_FALSE

Display the Original Details

This is done whenever the group is a reprint group (i.e. edition != 1) and the item has some reprint information (magid non-empty).

If is the first of this batch of reprints we need to output the indent (<UL>) to start them. In addition, if we're not already doing some issues then we need to output the <LI> to start the issue list; in this case we also need to change the format type to FPDT_IDXGN1 if it was FPDT_IDXGN2 to ensure the magazine name is output at the start of the list.

We then check to see if this reprint has a different author or title (clarify) and, if so, set them up in bylinebuf and output the issue details.

Finish off the Group

When we have finished processing all the items in a group we just need to do some housekeeping. First, if we were doing some issues (as is likely) we need to output the subject(s), byline(s) and ED notes (if we haven't already done them). We also need to close off the issue list (unless we're doing a column - why?) and, if we have indented some issues, close off the indentation.