FictionMags Data Format

Data Format

Main Record Types	Special Field Formats	Control Files	Support Files	Associated	Notes
Book ("A") Records	Author Fields	Abbrev.cvt	00000.xxx	User Documentation	Recent Changes
Book Qualifier ("D") Records	Title Fields	Covers.cvt	00config.xxx	Index Structure	First Editions
Item ("EA") Records	Item Types	Ft-Links.cvt	00xxx.mag	Magazine Indexes	First Appearance Notes
Item Qualifier ("EB/C/D/N/O/Q/T/X") Records	Publication Details	Pseud.cvt	pseud.xxx	Programs	Discussions
		Series.cvt	valnames.xxx	Style Guide	Indexing Process
		Synchronising	validate.xxx	Trigraphs

Introduction

Underlying the FictionMags Index Family is a formal data format that is used for coding up all the source records. To facilitate future development and to help clarify the format/usage of this data, this set of pages will attempt to describe, formally, what this format comprises.

The first point to stress is that, historically, there were two different formats in use:

the format(s) used by Bill Contento for maintaining the files in the US
the format used by Phil Stephensen-Payne (and Ian Covell) for maintaining the files in the UK

While these two formats were very similar in practise, the latter contained a number of extensions (e.g. in the format of publication details). To facilitate the exchange of data there are many parts of the system which exclusively use the "lowest common denominator" (typically the US format), translating the data as needed. While this is no longer necessary the work involved to remove it is to extensive to consider at the moment.

Note that, when illustrating the format of particular fields, the symbol ␢ is used to indicate a space where appropriate.

Basic Data File Layout

Each data file consists of a number of groups of records typically (though not always) separated by a blank line, where each group of records represents a single item (e.g. a book or magazine) being indexed. By convention, files listing books present the groups in alphabetical order of author surname, and in alphabetical order of book title (excluding leading articles) within an author; and files listing magazines present them in chronological order of magazine issue date. For magazine indexes the order of the groups will determine the order of entries in the Issue Index - specifics of magazine indexes are discussed further here.

Each group of records contains up to three different record types, which must be in order:

A single Book Record, identified by a leading 'A'
Zero or more Note Records, identified by a leading 'D'
Zero or more Item Records, identified by a leading 'E'

These are all described in more detail on the specific pages. There are also separate pages devoted to the complexities of some of the more complicated fields:

Control Files

Associated with the data files are a number of control files which are used to expand abbreviations in the files and/or for formatting the files in a manner suitable for publication in the indexes. There are currently four of these:

Abbrev.cvt: This defines the abbreviations used to represent magazines or books referenced in the data.
Covers.cvt: This identifies which cover image (if any) should be used to accompany a specified magazine issue
Ft-Links.cvt: This identifies which issues have "full text" versions available online
Pseud.cvt: This is used to identify different names used by a given author (particularly name variations and pseudonyms) as well as to provide further information about a given author and to identify author names that are potentially ambiguous.
Series.cvt: This is used to identify which series IDs are used by which authors and is primarily intended to help standardise series IDs so that all appropriate entries are listed together in the series index and to prevent two distinct series with the same name being inadvertently combined. It also allows some control over the way in which series IDs (and series prefixes) are displayed in the index.

Each of these (except Covers.cvt) has an associated xxx.new file which is used for holding changes since the data was last synchronised - see the section on Synchronising two sets of data.

Support Files

There are also a number of support files, typically specific to an individual index, that help control the way the index is formatted:

00000.xxx: This identifies which files form part of a given index and (optionally) the order in which they should be presented
00config.xxx: This defines the configuration for a given index
00xxx.mag: This identifies any additional references that need to be added to a given index
pseud.xxx: This identifies any entries in Pseud.cvt that need to be treated in a special way (e.g. to ignore date checks)
valnames.xxx: This identifies any known anomalies that Valnames should ignore
validate.xxx : This identifies any known anomalies that Validate should ignore

In the first three cases, the "xxx" is replaced by the acronym for the index (e.g. "CFI" for the Crime Fiction Index).

Associated

There are a number of associated pages that provide more detail on some aspects of the index(es):

User Documentation: This is the user-facing documentation that is intended for anyone who wishes to use and/or add to the indexes
Index Structure: This provides a detailed look at the structure of the indexes (and the underlying HTML). It was primarily created to help in the development of the v2 software but will be retained as an aid when reading the program code for generating the indexes. Note that some changes were made to the HTML during development that have not yet been documented.
Magazine Indexes: This looks in detail at some of the special ways in which the data files are handled when used to define magazines rather than books.
Programs: This lists (and to a certain degree) documents all the programs developed by Phil related to the data files, with an emphasis on those related to the indexes, as well as a handful of general purpose (home-grown and shareware) programs.
Style Guide: This discusses a number of areas (such as capitalisation) where an attempt as been made to use a consistent "house style" across the indexes, though there are still many (older) parts of the index which do not conform to the guidelines.
Trigraphs: This provides a detailed list of the trigraphs supported (including internal ones with precise details of what they represent and how they are converted.

Notes

Lastly there are a number of pages related to current (and past) discussions between multiple developers of the indexes:

Recent Changes: This is a log of changes made to the documentation (by Phil) so that others can see what has changed. (Note that this has not been updated since early 2019 when Bill stopped contributing to the indexes).
First Editions: This describes the agreed process for determining first editions within the indexes when multiple editions appeared (almost) simultaneously.
First Appearance Notes: This describes the agreed process for determining first appearances within the indexes when an item appeared in multiple locations simultaneously.
Discussions: This details some current (and historic) discussions that have been had on various topics, along with proposed (or implemented) ways of addressing the issues concerned.
Indexing Process: This discusses the steps taken when generating a new set of indexes.