At the moment it is not possible to regenerate the FictionMags Indexes with
the original suite of programs written by Bill Contento, and it is unclear when
it might again be possible to do so. In an attempt to address this, Phil Stephensen-Payne
has developed a new suite of programs (v2) and this document will attempt to
provide some background on the project and to highlight some of the differences
between the indexes generated by the two sets of software.
A couple of points should probably be emphasized at the outset:
Bill's programs are extremely sophisticated and have been developed over
a period of 40 years, during which many problems have been encountered and
overcome. It is inevitable that some of these problems will reappear in the
v2 programs (as well as exciting new ones!) so it is likely to be some time
before the v2 indexes reach the same level of sophistication as the v1 indexes.
As the v2 programs are being developed from a completely different code
base to the v1 programs, it is inevitable that some things that were easy
to achieve in the v1 indexes might be much harder to achieve with the v2 programs,
so some features may be delayed for a while (or indeed never implemented).
Fortunately the flipside is also true as some things that proved impossible
to achieve in the v1 indexes have proved to be much easier with the v2 programs.
The current intention is to generate a set of indexes that, in general, look
and feel as close to the v1 indexes as possible, partly because that format
is familiar to the users, but mainly because it has proven to be an excellent
format so there seems little need to mess with it.
The remainder of this document is divided into the following sections for ease
of reference:
The documentation of the Index Structure
also gives some more specific details on the two implementations. For a discussion
of recent changes to the software, as well as a list of known problems and possible
future enhancements, see the Change Log.
As always, all comments are welcome and should be sent to
New Features in the v2 Indexes
There are a number of new/different features in the v2 Indexes:
Inclusion of Book Files
Although an earlier version of the v1 programs was capable of handling both
books and magazines (c.f. The
Locus Index to Science Fiction: 1984-1998), the current version is not able
to because of changes made to support the Fictionmags Index Family properly.
The v2 programs have been written so that they can handle both books and magazines
and the intention is that the v2 Indexes will include both.
However the data files containing the data
for the books are in rather a poor state, partly because they are an amalgam
of multiple, disparate sources (e.g. Al Hubins' Index to Crime Fiction IV
and the Miscellaneous Anthologies Index) and partly because little work
has been done over the years to bring them up to the same standard as the magazine
data files. From time to time I have tried to do something about this, but after
five or more years have only managed to "clean up" about 5% of the
data. At the moment I am not sure how to address this.
In addition there are various questions
about how best book files should be handled as the old Locus index had a number
of problems. For that reason, although an early "proof of concept"
version of the Book Author Index was demonstrated, I intend to concentrate on
the magazine indexes (which is all that will be published for the time being)
and then revisit the book index when things quieten down.
Inclusion of Pseudonymous Stories in Author Lists
There was an anomaly in the v1 indexes that, if Author A wrote a story under
house name B, the story would be listed under both A and B in the indexes; however,
if Author A wrote a story under private pseudonym C, the story would only be
listed under the pseudonym. This has been addressed in the v2 indexes such that
all items known (or suspected) to be by author A will be listed as by author
A (though this has raised its own issues as discussed under Known
Issues below). Note that one side-effect of this is that the "also
as xxx" clauses in v1 are no longer needed as the details of such alternate
titles/authors are listed in full. See, for example, Edward Aarons in
v1 & v2.
Portability and Flexibility
In an attempt to avoid a repeat of the current situation, the v2 programs have
been written in such a way that they should be usable on any computer
(though this has yet to be tested). The generation of each index is controlled
by an Index Configuration File
and bases the page layouts on external boilerplate files, which should allow
a degree of flexibility without requiring program changes.
Minor implementation differences
The following minor differences have been implemented deliberately (but are
still open to debate):
Page Sizes: The point at which page breaks occur will be different
in the new indexes (I doubt it would be possible to get them exactly the same).
The new algorithms are based around minimum and maximum page sizes such that,
if a page has exceeded the minimum page length and reaches a natural break
(e.g. a new issue in the Magazine Issue Listings) then a new page will be
started and, if no such break is found, a new page will always be started
when it exceeds the maximum page length. For the current version, the minimum
and maximum have been set to 300 and 1000 (though note that, technically,
what is counted is the number of input data lines, not the number of lines
output to the index).
Issue Dates: The format of magazine issue dates has changed slightly
in most cases (except when dealing with aggregated items). Some examples include:
"Mar 1933" is now "March 1933"
"Sep 1 1929" is now "September 1, 1929"
"Sep #2 1927" is now "2nd September 1927"
In addition, where a magazine issue is identified by whole issue number and
month (e.g. American Dollar Western Magazine) both issue number and
month are listed (e.g. American Dollar Western Magazine #3, November
1953). For example compare the listings for E. Hamilton Clay in v1
& v2.
Secondary Authors: The display of secondary authors has been changed
slightly as discussed under the description of Issue
Listings. In addition, if we have a note saying "translated from"
and a secondary author labelled as a translator an attempt is made to combine
the two into a single coherent note; something similar is attempted with "adapted
from/by". For example compare the listings for Rudolph Baumbach
in v1
& v2.
Author References: In v1, items that referred to an author (i.e.
appeared in the Story
Author Listings under the ", [ref.]" sub-heading), were always
put inside double quotation marks. This struck me as confusing, partly because
it looks odd when used on fiction types (such as "sa") and partly
because it doesn't easily distinguish between items whose title was already
in double quotation marks and those that weren't. For these reasons, the quotation
marks are omitted from v2. For example, see the subject entries for John
Greenleaf Whittier in v1
& v2.
Authors with trailing ", Mrs." or ", Miss": Because
of the way the data is held internally, some names were displayed in v1 with
a trailing ", Mrs." or ", Miss" (or similar). In v2 these
have been moved to the front of the name in the usual way. For example see
the final entry for Anthony Alger in v1
& v2.
Item Note Text: In v1, in the Magazine Issue Listings, all Item Notes
are appended to the item details; (partly) as an experiment, in v2 Item
Appearance Notes appear on the same line as the item details while Item
Notes appear on the next line - I would welcome views on this. See for
example the October 1890 issue of Short Stories in v1
& v2.
Illustrators: In v1, in the Magazine Issue Listings, the illustrator(s)
for an item are displayed after the Item Note Text, which strikes me
as a little odd (particularly when the notes are extensive). In v2 I've moved
these to follow the Item Type/Prior Publication Details. For example, see
the penultimate entry in the August 1894 issue of The Strand Magazine
in v1
& v2.
Subject Names: In v1, subject names are simply displayed in square
brackets. I felt this could be a little confusing for users so, in v2, I have
prefaced the names with "Ref. ". For example see the second entry
in the October 1894 issue of The Strand Magazine in v1
& v2.
Extracts: In v1, if an item was qualified with a phrase such as "from
<Book Title>" the text was displayed as part of the Item Title
(where it is in the underlying data); in v2 this has been shifted between
the Item Type and the Prior Publication Details where it would seem to make
more sense. For example, see the entry toward the end of the January 1917
issue of The Strand Magazine in v1
& v2.
Incomplete Listings: The data files identify Incomplete Listings
by means of a flag, an issue note saying something like "Issue not Found"
and an item record saying "Need Contents". In v1, both the latter
are displayed but in v2 I do not display the "Need Contents" line,
partly because it's just easier to do so in the way I've written the code
and partly because it seems redundant. In addition, these issues in v1 are
just flagged in the bottom level index with (Placeholder); this has
been changed in v2 to (Incomplete)
for issues that have been partially indexed and (Missing)
for those for which we have no contents information at all. For example see
the entries for Ace G-Man Stories (Canada) in v1
& v2.
Columns with & without item titles: In v1, in the Story Author
Index, if only some instances of a column had item titles, the column is listed
twice - once aggregated without item titles followed by a separate listing
of those with titles (see Stookie Allen's "Illustrated Crimes" for
example); in v2 these have been combined with the standalone entry acting
as the column header.
Aggregated items: One of the more powerful features in the Story
Author Index is the aggregation of repeated items so that, for instance, all
parts of a serial are listed on a single line with each of the constituent
issues separately listed and hyperlinked, with issue dates abbreviated to
avoid repetition of magazine name and year. In some very complex cases, such
as John North's Where
to Go and How to Get There this already presents some problems. I've tried
to address these in v2 but there may be more work needed.
Magazine "see under" links: In v1, magazine "see under"
links always pointed to the same level of the index. Thus, for instance, the
link for Two-Gun Western Novelets Magazine from the bottom-level
index linked to Two-Gun Western in the bottom-level
index; while the link in the listings
took you to Two-Gun Western in the listings.
At the moment, in v2, both links will take you to the listings level - this
is partly because it was easier to do, and partly because I felt it made more
sense - I would welcome opinions on this.
Story Title Index: In v1
this was a simple 2-level index but the first page had become huge (6400 lines)
for the FMI so the decision was taken to convert it to a 3-level index in
v2. In addition, in v1 the story titles were sometimes abbreviated with a
trailing ellipsis (see the end of the second line) - it is unclear what the
purpose of this was so it has not (yet) been implemented in v2.
Chronological Index: In v1, if a single author was continued over
multiple pages, no continuation lines were output to the intermediate-level
index; in v2 continuation lines have been added identifying the date of the
first item on each continuation page. In addition, in v1 if an item was reprinted
under a different title but the same author, then both titles were listed
under the original publication date (e.g. "The Arizona Kid"/"Arizona
Yellow" by Stuart
Adams which struck me as confusing; in v2 only the original title is listed.
Magazine Issue Index: In v1, no matter how many issues a single magazine
had, there were never any continuation pages in the Bottom Level Index
leading to some very large pages; in v2 continuation pages have been added
and continuation lines have been added to the intermediate-level index identifying
the date of the first issue on each continuation page. See for example Short
Stories in v1
& v2.
Decoded URLs: Wikipedia increasingly uses accented characters for
URLS, such as https://en.wikipedia.org/wiki/Emilia_Pardo_Bazán.
As these cause problems in the multi-system environment, they are stored in
the database in their "encoded" form, such as https://en.wikipedia.org/wiki/Emilia_Pardo_Baz%C3%A1n
which is portable across all environments. In v1
these encoded URLs were displayed as stored while in v2 an attempt is made
to decode them back to the HTML equivalents of the original strings.
"Also as" authors: There are cases where we know a particular
author has used multiple names but do not know which is their real name, and
these are defined by also
as records. The documentation states that these links should only be displayed
when both authors have entries in the list, but in v1 they seem to be displayed
in all cases (e.g. Adams,
Stewart in the WFI). This has been corrected in v2.
Magazines formatted as books: Some magazines (such as New Worlds)
published some issues as books rather than regular issues where the "title"
of the issue doesn't follow the usual format for magazine issues. In v1, when
these were listed in the Magazine Issue Bottom Level Index only the
publication details were listed with no indication of the issues titles (as
with New
Worlds Quarterly). At the same time, the listings for these in the
Magazine Issue Listings did contain the book title, but did not contain
the usual link back to the Bottom Level Index feature header (as here for
New
Worlds Quarterly); in v2 both issues have been corrected.
List of magazine issues for a given editor: In v1, editors of magazine
issues have entries in the Story Author Listings that are designed to look
like the "Pub. Info." entries (as here
for Robert Lowndes); as the "Pub. Info" entries are being phased
out, this has been changed in v2 to list all such issues as aggregated items
with links to each individual issue.
Cross-index linking: There are
five distinct indexes that are grouped into author order (Book Author; Story
Author; Chronological; Artist & Biographical Notes) but the linkages between
them in v1 seem a little coherent. Of the four supported two (Story Author
& Artist) provide "about" links to the Biographical Notes in
both the Intermediate Level Index and the Listings, but the Chronological
Index doesn't; the Story Author also provides links to both the Chronological
and Artist indexes, but only in the Listings, while the Chronological Index
only provides links to the Stories index (also only in the Listings) while
the Biographical Notes provide links to the Story Author and Artist Indexes
in both the Intermediate Level Index and the Listings, but not to the Chronological
Index. In v2 cross-links are provided in both the Intermediate Level Index
and the Listings from all five indexes to each of the other four (where relevant).
Note that in v1, the Biographical Notes (only) also had an "assoc."
link to the [ref.] section of the Story Author Listings for an author where
relevant - I can't see the value of this so have not implemented it in v2.
Handling of anonymous authors: In
v1 anonymous authors were listed as Anonymous in the Story
Author Index (as the primary entry), as Anon. in the Issue
Index when a primary author and as uncredited when a secondary
author. In v2 an attempt has been made to rationalise all of these into
[uncredited] which has the added advantage that it sorts to the end
of the Story Author Index along with all the other "odd" names like
[The Readers] and [Various]. In addition, when used as a secondary
author it is generally suppressed completely so that, for example, rather
than "translated,
uncredited" we have "(translated)".
Series listings: In v1 support was provided for multiple views of
a given series: By Author, By Title and/or By Date, but the "By Author"
view was always displayed even when inappropriate. In v2 this view can be
suppressed if required.
Links from GCP Website: In v1, links from the "Big List"
on the GCP Website went to the Magazine
Issue Bottom Level Index; in v2 they go to the Magazine Issue Listings
- this was primarily done because it makes the code much neater, but I'm inclined
to think it is more useful as well.
Current Implementation Progress
There are 15 main groups of files (potentially) generated by the index generation
programs, plus 21 index levels, making a total of 36 distinct entities as follows:
The above names which will be used where appropriate in the rest of this document
and each has been linked to a representative sample page in the current indexes
for illustration purposes.
Other than the points listed under Known Issues
everything should be in place so please let me know if you spot anything missing
(as well as anything that is malformed/links to the wrong place/is just downright
wrong).
Known Differences
There are a handful of features in the current index that I am thinking of
not implementing in the new software and would welcome feedback on:
The semi-capitalization of Author Names in the Story Author Index (see Discussion
Points below)
The Issue Checklists:
this strikes me as a lot of additional code for relatively minor benefit;
at some point I hope to look at integrating the indexes and the GCP Website
more closely so that the GCP Illustrated Checklists can link through to the
indexes.
The "other
pseudonyms" links in the Story Author Intermediate Level Index and
Story Author Listings; I'm not sure these are needed any more now that all
pseudonymous items are listed under the main author.
When the Crime Fiction Index was originally created, a feature was
added to allow the indentation of column headers to be suppressed in the Story
Author Index for specified headings - this was used when the heading spanned
multiple authors such that each typically only had a single item in the "column",
making the indentation unnecessary. I had never noticed that, in v1, this
had also been converted into a pseudo-series index linking to the title index
(see the handling of "The Story That Won: High Protein Diet" by
Susan Osborn.
While I can see the attraction of listing all the titles in this "series",
I think it would be better done (if at all) in chronological order in the
Series Index. I will consider the latter for a later release but do not intend
to implement the current approach.
The Publisher Index
was specifically created for the Crime Fiction Index and the records
used to populate it do not (generally) exist in the rest of the database.
I'm thinking of a more general solution that might prove useful across the
whole database so do not plan to implement the CFI-specific version in the
near future.
Discussion Points
Semi-capitalization of Author Names
In the Story Author and Book Author Index, author names are partially capitalized
(e.g. ABBE, GEORGE (Bancroft)) but the algorithm does throw up some anomalies
(such as AARD-varksson, AVRAM and ABREN, KATHanth). For some reason
the author names in the Biographical Notes aren't capitalized and, to my untrained
eyes, they look just as good (if not better) so I'm inclined to leave them uncapitalized
in all the indexes.
How should "Artists" be handled?
One unresolved issue for some time has been the distinction between authors
and artists. At the moment anybody listed as the illustrator of a book/magazine
or story, or as the "author" of an item with an item type of "il"
(I think). Some questions that arise include:
should cartoonists (i.e. "authors" of an item with an item type
of "ct") be included? Similarly for item type "ia"?
some items types (such as "pi" and "cs") sometimes list
the associated artist separately and sometimes not (e.g. when they are solely
responsible for the item) which will affect where the item is listed.
when we have an item that is about an individual, there is no easy
way of knowing if the individual concerned is an author or an artist so I
guess that at present we might have a "[ref.]" entry in the author
index for somebody who only appears in the artist index.
Aggregation of [unknown items]
As discussed above both versions of the index attempt
to aggregate repeated items such as columns or serial parts. This approach is
also used for multiple stories with the same title (typically when there are
a series of stories about a given character which all have the same title).
In the v1 indexes this also applies to "unknown" stories as with Myrtle
Juliette Corey. I'm not convinced this makes a lot of sense as it implies
a commonality among the stories which doesn't exist, so in v2 these are listed
separately rather than being aggregated, but I would welcome views on the matter.
Normalisation of volume/issue numbers
In general, when indexing magazines, we try to capture the information (e.g.
author names and titles) in the magazine as it is listed in the magazine rather
than normalising it (beyond capitalisation) - e.g. if one issue had a story
by Robert Heinlein and another by Robert A. Heinlein then we record them as
such, rather than changing the former to Robert A. Heinlein, say. (Actually,
in the very early days this wasn't the case so some of the very old data in
the database is incorrect in this regard).
One area this has never applied to is the recording of volume and issue numbers
which, for the v1 index programmes, are always in the format vnn #mm (this is
a hangover from an early version of the code which relied on these fields to
sort the data into order). For some years I have been recording these as specified
in the magazine and simply convert them to the standard format when I pass the
data to Bill.
It will never be possible to use the "correct" volume/issue number
format for the whole of the FMI, partly because of the time involved in correcting
150,000 issues, but mainly because we simply don't have the information - much
of the data in the FMI comes from secondary sources which doesn't list such
things "correctly".
The question is whether the "mixed mode" is acceptable with a degree
of tidying up (e.g. in most pulps you can extrapolate the format of a whole
batch of issues if you know the format of the first and last in the range) or
whether people feel it is so "untidy" that we stick with vnn #mm ad
infinitum.