Introduction
Underlying the FictionMags
Index Family is a formal data format that is used for coding up all the
source records. To facilitate future development and to help clarify the format/usage
of this data, this set of pages will attempt to describe, formally, what this
format comprises.
The first point to stress is that,
historically, there were two different formats in use:
- the format(s) used by Bill Contento
for maintaining the files in the US
- the format used by Phil Stephensen-Payne
(and Ian Covell) for maintaining the files in the UK
While these two formats were very
similar in practise, the latter contained a number of extensions (e.g. in the
format of publication details). To facilitate
the exchange of data there are many parts of the system which exclusively use
the "lowest common denominator" (typically the US format), translating
the data as needed. While this is no longer necessary the work involved to remove
it is to extensive to consider at the moment.
Note that, when illustrating the
format of particular fields, the symbol ␢ is used to indicate a space
where appropriate.
Basic Data File Layout
Each data file consists of a number
of groups of records typically (though not always) separated by a blank line,
where each group of records represents a single item (e.g. a book or magazine)
being indexed. By convention, files listing books present the groups in alphabetical
order of author surname, and in alphabetical order of book title (excluding
leading articles) within an author; and files listing magazines present them
in chronological order of magazine issue date. For magazine indexes the order
of the groups will determine the order of entries in the Issue Index - specifics
of magazine indexes are discussed further here.
Each group of records contains up
to three different record types, which must be in order:
These are all described in more detail
on the specific pages. There are also separate pages devoted to the complexities
of some of the more complicated fields:
Control Files
Associated with the data files are
a number of control files which are used to expand abbreviations in the files
and/or for formatting the files in a manner suitable for publication in the
indexes. There are currently four of these:
- Abbrev.cvt:
This defines the abbreviations used to represent magazines or books referenced
in the data.
- Covers.cvt:
This identifies which cover image (if any) should be used to accompany a specified
magazine issue
- Ft-Links.cvt:
This identifies which issues have "full text" versions available
online
- Pseud.cvt:
This is used to identify different names used by a given author (particularly
name variations and pseudonyms) as well as to provide further information
about a given author and to identify author names that are potentially ambiguous.
- Series.cvt:
This is used to identify which series IDs are used by which authors and is
primarily intended to help standardise series IDs so that all appropriate
entries are listed together in the series index and to prevent two distinct
series with the same name being inadvertently combined. It also allows some
control over the way in which series IDs (and series prefixes) are displayed
in the index.
Each of these (except Covers.cvt)
has an associated xxx.new file which is used for holding changes since the data
was last synchronised - see the section on Synchronising
two sets of data.
Support Files
There are also a number of support
files, typically specific to an individual index, that help control the way
the index is formatted:
- 00000.xxx:
This identifies which files form part of a given index and (optionally) the
order in which they should be presented
- 00config.xxx:
This defines the configuration for a given index
- 00xxx.mag:
This identifies any additional references that need to be added to a given
index
- pseud.xxx:
This identifies any entries in Pseud.cvt
that need to be treated in a special way (e.g. to ignore date checks)
- valnames.xxx:
This identifies any known anomalies that Valnames should ignore
- validate.xxx:
This identifies any known anomalies that Validate should ignore
In the first three cases, the "xxx"
is replaced by the acronym for the index (e.g. "CFI" for the Crime
Fiction Index).
Associated
There are a number of associated
pages that provide more detail on some aspects of the index(es):
- User Documentation:
This is the user-facing documentation that is intended for anyone who wishes
to use and/or add to the indexes
- Index
Structure: This provides a detailed look at the structure of the indexes
(and the underlying HTML). It was primarily created to help in the development
of the v2 software but will be retained as an aid when reading the program
code for generating the indexes. Note that some changes were made to the HTML
during development that have not yet been documented.
- Magazine
Indexes: This looks in detail at some of the special ways in which the
data files are handled when used to define magazines rather than books.
- Programs:
This lists (and to a certain degree) documents all the programs developed
by Phil related to the data files, with an emphasis on those related to the
indexes, as well as a handful of general purpose (home-grown and shareware)
programs.
- Style
Guide: This discusses a number of areas (such as capitalisation) where
an attempt as been made to use a consistent "house style" across
the indexes, though there are still many (older) parts of the index which
do not conform to the guidelines.
- Trigraphs:
This provides a detailed list of the trigraphs supported (including internal
ones with precise details of what they represent and how they are converted.
Notes
Lastly there are a number of pages
related to current (and past) discussions between multiple developers of the
indexes:
- Recent
Changes: This is a log of changes made to the documentation (by Phil)
so that others can see what has changed. (Note that this has not been updated
since early 2019 when Bill stopped contributing to the indexes).
- First
Editions: This describes the agreed process for determining first editions
within the indexes when multiple editions appeared (almost) simultaneously.
- First
Appearance Notes: This describes the agreed process for determining first
appearances within the indexes when an item appeared in multiple locations
simultaneously.
- Discussions:
This details some current (and historic) discussions that have been had on
various topics, along with proposed (or implemented) ways of addressing the
issues concerned.
- Indexing
Process: This discusses the steps taken when generating a new set of indexes.