The FictionMags Index Family
The v2 Index Project

Introduction

At the moment it is not possible to regenerate the FictionMags Indexes with the original suite of programs written by Bill Contento, and it is unclear when it might again be possible to do so. In an attempt to address this, Phil Stephensen-Payne has developed a new suite of programs (v2) and this document will attempt to provide some background on the project and to highlight some of the differences between the indexes generated by the two sets of software.

A couple of points should probably be emphasized at the outset:

Bill's programs are extremely sophisticated and have been developed over a period of 40 years, during which many problems have been encountered and overcome. It is inevitable that some of these problems will reappear in the v2 programs (as well as exciting new ones!) so it is likely to be some time before the v2 indexes reach the same level of sophistication as the v1 indexes.
As the v2 programs are being developed from a completely different code base to the v1 programs, it is inevitable that some things that were easy to achieve in the v1 indexes might be much harder to achieve with the v2 programs, so some features may be delayed for a while (or indeed never implemented). Fortunately the flipside is also true as some things that proved impossible to achieve in the v1 indexes have proved to be much easier with the v2 programs.

The current intention is to generate a set of indexes that, in general, look and feel as close to the v1 indexes as possible, partly because that format is familiar to the users, but mainly because it has proven to be an excellent format so there seems little need to mess with it.

The remainder of this document is divided into the following sections for ease of reference:

New features in the v2 indexes
Current implementation progress
Known differences
Discussion points

The documentation of the Index Structure also gives some more specific details on the two implementations. For a discussion of recent changes to the software, as well as a list of known problems and possible future enhancements, see the Change Log.

As always, all comments are welcome and should be sent to

New Features in the v2 Indexes

There are a number of new/different features in the v2 Indexes:

Inclusion of Book Files

Although an earlier version of the v1 programs was capable of handling both books and magazines (c.f. The Locus Index to Science Fiction: 1984-1998), the current version is not able to because of changes made to support the Fictionmags Index Family properly. The v2 programs have been written so that they can handle both books and magazines and the intention is that the v2 Indexes will include both.

However the data files containing the data for the books are in rather a poor state, partly because they are an amalgam of multiple, disparate sources (e.g. Al Hubins' Index to Crime Fiction IV and the Miscellaneous Anthologies Index) and partly because little work has been done over the years to bring them up to the same standard as the magazine data files. From time to time I have tried to do something about this, but after five or more years have only managed to "clean up" about 5% of the data. At the moment I am not sure how to address this.

In addition there are various questions about how best book files should be handled as the old Locus index had a number of problems. For that reason, although an early "proof of concept" version of the Book Author Index was demonstrated, I intend to concentrate on the magazine indexes (which is all that will be published for the time being) and then revisit the book index when things quieten down.

Inclusion of Pseudonymous Stories in Author Lists

There was an anomaly in the v1 indexes that, if Author A wrote a story under house name B, the story would be listed under both A and B in the indexes; however, if Author A wrote a story under private pseudonym C, the story would only be listed under the pseudonym. This has been addressed in the v2 indexes such that all items known (or suspected) to be by author A will be listed as by author A (though this has raised its own issues as discussed under Known Issues below). Note that one side-effect of this is that the "also as xxx" clauses in v1 are no longer needed as the details of such alternate titles/authors are listed in full. See, for example, Edward Aarons in v1 & v2.

Portability and Flexibility

In an attempt to avoid a repeat of the current situation, the v2 programs have been written in such a way that they should be usable on any computer (though this has yet to be tested). The generation of each index is controlled by an Index Configuration File and bases the page layouts on external boilerplate files, which should allow a degree of flexibility without requiring program changes.

Minor implementation differences

The following minor differences have been implemented deliberately (but are still open to debate):

Page Sizes: The point at which page breaks occur will be different in the new indexes (I doubt it would be possible to get them exactly the same). The new algorithms are based around minimum and maximum page sizes such that, if a page has exceeded the minimum page length and reaches a natural break (e.g. a new issue in the Magazine Issue Listings) then a new page will be started and, if no such break is found, a new page will always be started when it exceeds the maximum page length. For the current version, the minimum and maximum have been set to 300 and 1000 (though note that, technically, what is counted is the number of input data lines, not the number of lines output to the index).
Issue Dates: The format of magazine issue dates has changed slightly in most cases (except when dealing with aggregated items). Some examples include:
- "Mar 1933" is now "March 1933"
- "Sep 1 1929" is now "September 1, 1929"
- "Sep #2 1927" is now "2nd September 1927"
In addition, where a magazine issue is identified by whole issue number and month (e.g. American Dollar Western Magazine) both issue number and month are listed (e.g. American Dollar Western Magazine #3, November 1953). For example compare the listings for E. Hamilton Clay in v1 & v2.
Secondary Authors: The display of secondary authors has been changed slightly as discussed under the description of Issue Listings. In addition, if we have a note saying "translated from" and a secondary author labelled as a translator an attempt is made to combine the two into a single coherent note; something similar is attempted with "adapted from/by". For example compare the listings for Rudolph Baumbach in v1 & v2.
Author References: In v1, items that referred to an author (i.e. appeared in the Story Author Listings under the ", [ref.]" sub-heading), were always put inside double quotation marks. This struck me as confusing, partly because it looks odd when used on fiction types (such as "sa") and partly because it doesn't easily distinguish between items whose title was already in double quotation marks and those that weren't. For these reasons, the quotation marks are omitted from v2. For example, see the subject entries for John Greenleaf Whittier in v1 & v2.
Authors with trailing ", Mrs." or ", Miss": Because of the way the data is held internally, some names were displayed in v1 with a trailing ", Mrs." or ", Miss" (or similar). In v2 these have been moved to the front of the name in the usual way. For example see the final entry for Anthony Alger in v1 & v2.
Item Note Text: In v1, in the Magazine Issue Listings, all Item Notes are appended to the item details; (partly) as an experiment, in v2 Item Appearance Notes appear on the same line as the item details while Item Notes appear on the next line - I would welcome views on this. See for example the October 1890 issue of Short Stories in v1 & v2.
Illustrators: In v1, in the Magazine Issue Listings, the illustrator(s) for an item are displayed after the Item Note Text, which strikes me as a little odd (particularly when the notes are extensive). In v2 I've moved these to follow the Item Type/Prior Publication Details. For example, see the penultimate entry in the August 1894 issue of The Strand Magazine in v1 & v2.
Subject Names: In v1, subject names are simply displayed in square brackets. I felt this could be a little confusing for users so, in v2, I have prefaced the names with "Ref. ". For example see the second entry in the October 1894 issue of The Strand Magazine in v1 & v2.
Extracts: In v1, if an item was qualified with a phrase such as "from <Book Title>" the text was displayed as part of the Item Title (where it is in the underlying data); in v2 this has been shifted between the Item Type and the Prior Publication Details where it would seem to make more sense. For example, see the entry toward the end of the January 1917 issue of The Strand Magazine in v1 & v2.
Incomplete Listings: The data files identify Incomplete Listings by means of a flag, an issue note saying something like "Issue not Found" and an item record saying "Need Contents". In v1, both the latter are displayed but in v2 I do not display the "Need Contents" line, partly because it's just easier to do so in the way I've written the code and partly because it seems redundant. In addition, these issues in v1 are just flagged in the bottom level index with (Placeholder); this has been changed in v2 to (Incomplete) for issues that have been partially indexed and (Missing) for those for which we have no contents information at all. For example see the entries for Ace G-Man Stories (Canada) in v1 & v2.
Columns with & without item titles: In v1, in the Story Author Index, if only some instances of a column had item titles, the column is listed twice - once aggregated without item titles followed by a separate listing of those with titles (see Stookie Allen's "Illustrated Crimes" for example); in v2 these have been combined with the standalone entry acting as the column header.
Aggregated items: One of the more powerful features in the Story Author Index is the aggregation of repeated items so that, for instance, all parts of a serial are listed on a single line with each of the constituent issues separately listed and hyperlinked, with issue dates abbreviated to avoid repetition of magazine name and year. In some very complex cases, such as John North's Where to Go and How to Get There this already presents some problems. I've tried to address these in v2 but there may be more work needed.
Magazine "see under" links: In v1, magazine "see under" links always pointed to the same level of the index. Thus, for instance, the link for Two-Gun Western Novelets Magazine from the bottom-level index linked to Two-Gun Western in the bottom-level index; while the link in the listings took you to Two-Gun Western in the listings. At the moment, in v2, both links will take you to the listings level - this is partly because it was easier to do, and partly because I felt it made more sense - I would welcome opinions on this.
Story Title Index: In v1 this was a simple 2-level index but the first page had become huge (6400 lines) for the FMI so the decision was taken to convert it to a 3-level index in v2. In addition, in v1 the story titles were sometimes abbreviated with a trailing ellipsis (see the end of the second line) - it is unclear what the purpose of this was so it has not (yet) been implemented in v2.
Chronological Index: In v1, if a single author was continued over multiple pages, no continuation lines were output to the intermediate-level index; in v2 continuation lines have been added identifying the date of the first item on each continuation page. In addition, in v1 if an item was reprinted under a different title but the same author, then both titles were listed under the original publication date (e.g. "The Arizona Kid"/"Arizona Yellow" by Stuart Adams which struck me as confusing; in v2 only the original title is listed.
Magazine Issue Index: In v1, no matter how many issues a single magazine had, there were never any continuation pages in the Bottom Level Index leading to some very large pages; in v2 continuation pages have been added and continuation lines have been added to the intermediate-level index identifying the date of the first issue on each continuation page. See for example Short Stories in v1 & v2.
Decoded URLs: Wikipedia increasingly uses accented characters for URLS, such as https://en.wikipedia.org/wiki/Emilia_Pardo_Bazán. As these cause problems in the multi-system environment, they are stored in the database in their "encoded" form, such as https://en.wikipedia.org/wiki/Emilia_Pardo_Baz%C3%A1n which is portable across all environments. In v1 these encoded URLs were displayed as stored while in v2 an attempt is made to decode them back to the HTML equivalents of the original strings.
"Also as" authors: There are cases where we know a particular author has used multiple names but do not know which is their real name, and these are defined by also as records. The documentation states that these links should only be displayed when both authors have entries in the list, but in v1 they seem to be displayed in all cases (e.g. Adams, Stewart in the WFI). This has been corrected in v2.
Magazines formatted as books: Some magazines (such as New Worlds) published some issues as books rather than regular issues where the "title" of the issue doesn't follow the usual format for magazine issues. In v1, when these were listed in the Magazine Issue Bottom Level Index only the publication details were listed with no indication of the issues titles (as with New Worlds Quarterly). At the same time, the listings for these in the Magazine Issue Listings did contain the book title, but did not contain the usual link back to the Bottom Level Index feature header (as here for New Worlds Quarterly); in v2 both issues have been corrected.
List of magazine issues for a given editor: In v1, editors of magazine issues have entries in the Story Author Listings that are designed to look like the "Pub. Info." entries (as here for Robert Lowndes); as the "Pub. Info" entries are being phased out, this has been changed in v2 to list all such issues as aggregated items with links to each individual issue.
Cross-index linking: There are five distinct indexes that are grouped into author order (Book Author; Story Author; Chronological; Artist & Biographical Notes) but the linkages between them in v1 seem a little coherent. Of the four supported two (Story Author & Artist) provide "about" links to the Biographical Notes in both the Intermediate Level Index and the Listings, but the Chronological Index doesn't; the Story Author also provides links to both the Chronological and Artist indexes, but only in the Listings, while the Chronological Index only provides links to the Stories index (also only in the Listings) while the Biographical Notes provide links to the Story Author and Artist Indexes in both the Intermediate Level Index and the Listings, but not to the Chronological Index. In v2 cross-links are provided in both the Intermediate Level Index and the Listings from all five indexes to each of the other four (where relevant). Note that in v1, the Biographical Notes (only) also had an "assoc." link to the [ref.] section of the Story Author Listings for an author where relevant - I can't see the value of this so have not implemented it in v2.
Handling of anonymous authors: In v1 anonymous authors were listed as Anonymous in the Story Author Index (as the primary entry), as Anon. in the Issue Index when a primary author and as uncredited when a secondary author. In v2 an attempt has been made to rationalise all of these into [uncredited] which has the added advantage that it sorts to the end of the Story Author Index along with all the other "odd" names like [The Readers] and [Various]. In addition, when used as a secondary author it is generally suppressed completely so that, for example, rather than "translated, uncredited" we have "(translated)".
Series listings: In v1 support was provided for multiple views of a given series: By Author, By Title and/or By Date, but the "By Author" view was always displayed even when inappropriate. In v2 this view can be suppressed if required.
Links from GCP Website: In v1, links from the "Big List" on the GCP Website went to the Magazine Issue Bottom Level Index; in v2 they go to the Magazine Issue Listings - this was primarily done because it makes the code much neater, but I'm inclined to think it is more useful as well.

Current Implementation Progress

There are 15 main groups of files (potentially) generated by the index generation programs, plus 21 index levels, making a total of 36 distinct entities as follows:

Table of Contents
Issue Checklists: (v1 only)
Publisher Index: (v1 only)
- Publisher Top Level Index
- Publisher Listings
Magazine Issue Index:
Book Author Index: (not yet implemented)
- Book Author Top Level Index
- Book Author Intermediate Level Index
- Book Author Bottom Level Index
- Book Contents Listings
Book Title Index: (not yet implemented)
- Book Title Top Level Index
- Book Title Listings
Story Author Index:
Story Title Index:
Chronological Index:
Series Index:
Artist Index:
Biographical Notes:
Full-Text Links:
- Full-Text Links Top-Level Index
- Magazine Issues with Full-Text Links (v2 only)
Statistics
New Additions

The above names which will be used where appropriate in the rest of this document and each has been linked to a representative sample page in the current indexes for illustration purposes.

Other than the points listed under Known Issues everything should be in place so please let me know if you spot anything missing (as well as anything that is malformed/links to the wrong place/is just downright wrong).

Known Differences

There are a handful of features in the current index that I am thinking of not implementing in the new software and would welcome feedback on:

The semi-capitalization of Author Names in the Story Author Index (see Discussion Points below)
The Issue Checklists: this strikes me as a lot of additional code for relatively minor benefit; at some point I hope to look at integrating the indexes and the GCP Website more closely so that the GCP Illustrated Checklists can link through to the indexes.
The "other pseudonyms" links in the Story Author Intermediate Level Index and Story Author Listings; I'm not sure these are needed any more now that all pseudonymous items are listed under the main author.
When the Crime Fiction Index was originally created, a feature was added to allow the indentation of column headers to be suppressed in the Story Author Index for specified headings - this was used when the heading spanned multiple authors such that each typically only had a single item in the "column", making the indentation unnecessary. I had never noticed that, in v1, this had also been converted into a pseudo-series index linking to the title index (see the handling of "The Story That Won: High Protein Diet" by Susan Osborn. While I can see the attraction of listing all the titles in this "series", I think it would be better done (if at all) in chronological order in the Series Index. I will consider the latter for a later release but do not intend to implement the current approach.
The Publisher Index was specifically created for the Crime Fiction Index and the records used to populate it do not (generally) exist in the rest of the database. I'm thinking of a more general solution that might prove useful across the whole database so do not plan to implement the CFI-specific version in the near future.

Discussion Points

Semi-capitalization of Author Names

In the Story Author and Book Author Index, author names are partially capitalized (e.g. ABBE, GEORGE (Bancroft)) but the algorithm does throw up some anomalies (such as AARD-vark’sson, AVRAM and ABREN, KATH’anth). For some reason the author names in the Biographical Notes aren't capitalized and, to my untrained eyes, they look just as good (if not better) so I'm inclined to leave them uncapitalized in all the indexes.

How should "Artists" be handled?

One unresolved issue for some time has been the distinction between authors and artists. At the moment anybody listed as the illustrator of a book/magazine or story, or as the "author" of an item with an item type of "il" (I think). Some questions that arise include:

should cartoonists (i.e. "authors" of an item with an item type of "ct") be included? Similarly for item type "ia"?
some items types (such as "pi" and "cs") sometimes list the associated artist separately and sometimes not (e.g. when they are solely responsible for the item) which will affect where the item is listed.
when we have an item that is about an individual, there is no easy way of knowing if the individual concerned is an author or an artist so I guess that at present we might have a "[ref.]" entry in the author index for somebody who only appears in the artist index.

Aggregation of [unknown items]

As discussed above both versions of the index attempt to aggregate repeated items such as columns or serial parts. This approach is also used for multiple stories with the same title (typically when there are a series of stories about a given character which all have the same title). In the v1 indexes this also applies to "unknown" stories as with Myrtle Juliette Corey. I'm not convinced this makes a lot of sense as it implies a commonality among the stories which doesn't exist, so in v2 these are listed separately rather than being aggregated, but I would welcome views on the matter.

Normalisation of volume/issue numbers

In general, when indexing magazines, we try to capture the information (e.g. author names and titles) in the magazine as it is listed in the magazine rather than normalising it (beyond capitalisation) - e.g. if one issue had a story by Robert Heinlein and another by Robert A. Heinlein then we record them as such, rather than changing the former to Robert A. Heinlein, say. (Actually, in the very early days this wasn't the case so some of the very old data in the database is incorrect in this regard).

One area this has never applied to is the recording of volume and issue numbers which, for the v1 index programmes, are always in the format vnn #mm (this is a hangover from an early version of the code which relied on these fields to sort the data into order). For some years I have been recording these as specified in the magazine and simply convert them to the standard format when I pass the data to Bill.

It will never be possible to use the "correct" volume/issue number format for the whole of the FMI, partly because of the time involved in correcting 150,000 issues, but mainly because we simply don't have the information - much of the data in the FMI comes from secondary sources which doesn't list such things "correctly".

The question is whether the "mixed mode" is acceptable with a degree of tidying up (e.g. in most pulps you can extrapolate the format of a whole batch of issues if you know the format of the first and last in the range) or whether people feel it is so "untidy" that we stick with vnn #mm ad infinitum.

After some discussion the current approach is to normalise them in the Magazine Issue Bottom Level Index but leave them as entered in the Magazine Issue Listings (which is analogous to the way we handle author names).

The FictionMags Index Family The v2 Index Project