Calibration Services Specification

Requirements

Archive all calibration data with even a remote chance of being of future interest.
Provide transparent access to "the right" calibration data for analyzing a given event (e.g. during event reconstruction).
Support queries for access to a particular calibration data set.
Facilitate others forms of query, such as per-channel histories.
Facilitate automated report generation.

The Right Stuff

During event processing, the right calibration data

must be of the right type (won't do to fetch tracker alignment when what is needed is CAL pedestals!)
must be for the right instrument
should have a period of validity which includes the time the event was taken
should be of production quality (that is, the calibration output should be an accurate representation of the state of the instrument)
must be in a compatible format, one the client program can handle

Elements of an Implementation

Data and Metadata

From the above it's clear that we need to deal with two kinds of information: the calibration data proper, and information about a particular calibration data set, such as its period of validity. The latter will be called the metadata for the data set. The entire collection of metadata resides in the Calibration Registry. Metadata include, but are not limited to, criteria used in searching for a particular data set. Metadata fields are sufficiently generic that nearly all of them will be meaningful for all or almost all calibration types. All of this (highly structured, searchable) argues for keeping the metadata in a relational database. Current plans are to use MySQL, which has several desirable properties, including some advantages over Oracle.

It implements almost all standard SQL features (perhaps all by now), and certainly all that we would need; performance is reputed to be quite good.
It runs on all of our platforms.
It's open source and free.
It has a relatively friendly API.

Calibration data, as opposed to metadata, come in many forms. Output from a calibration procedure may be small or large, of fixed or variable size, more or less structured, depending on the procedure and the state of the intrument at the time of the calibration. A relational database representation would be suitable, perhaps optimal, for some procedures, but not others. Fortunately there is no requirement or even preference that all calibration data be stored the same way. It would be sensible to stick to a small number of supported formats since software resources are not infinite. Two or three (ROOT file, XML file, MySQL table??) ought to be enough to avoid any serious awkwardness in representing and accessing the data.

Gaudi Data Service Paradigm

A Gaudi data service, as described in the Gaudi Developer's Guide, will be implemented for calibration data. Such a service handles all the details of finding the right data set and insuring that it is available in an in-memory representation (in Gaudi terminology, the Transient Data Store, or TDS). The client application accesses the data exclusively via TDS classes; it has no need to know anything about the persistent form of the data, nor does it need to know how to find a particular data set among several of the same kind.

Components of the Calibration Data Service will include

a Data Finder. The Data Finder (implementing the Gaudi IDataProviderSvc interface, or something similar) appropriate data set if it is not already in the TDS. It will do this by delegating most of the work to a MySQL conversion service and ultimately to another conversion service which knows about the physical form the bulk calibration data is in.
Conversion services. For each supported persistent form a class which can open a connection to a particular data set in that form.
Gaudi Converters. Classes which, given access to the persistent form of a data set, can convert it to its TDS representation. The number of these required will depend on the similarity (or not) of the different calibration output data sets, but there will be at least one Converter for each supported persistent form.

The Data Finder makes use of the remaining classes as needed to get the data into the TDS without any intervention from the client.

Accessibility and Portability

Implicit in several of the high-level requirements is the assumption that the Calibration Services will be available in any environment in which clients might run, be that at SLAC, at another collaborating institution, or at 30,000 feet on a laptop. (This most likely excludes archiving or any other applications involving writes.) Two techniques will be employed to achieve this:

Access over the network to the central data store
Replication of some part of the central store for local use

Network access is normally preferred when feasible. We expect the calibration data store, broadly speaking, to contain only two kinds of objects visible to applications: MySQL data and (maybe) physical files, such as ROOT files. MySQL has a client/server architecture in which network access is transparent except for possible performance degradation if the network connection is poor. Physical files can be made readily network accessible for reading by, for example, keeping them in a dedicated anonymous ftp area.

In case there is no network connection or only a poor one, users may copy needed files and extract parts of the MySQL database offline. They would have to maintain their own local MySQL server and pool of data files. Client programs would have to be told, perhaps via Gaudi job options, to use the local server and local files. (Or, for a really minimal system with just one calibration data set per calibration type, could probably dispense with MySQL altogether..)

Calibration Services in Action

The principal use cases correspond to the Requirements.

Archive a calibration data set

There are two parts to archiving a calibration data set: create the persistent form of the data in one of the supported formats, then register the data set in the Calibration Registry. How the persistent form is created will vary, depending on the calibration procedure. Registering the calibration just consists of adding a row (the metadata) to a MySQL table. Both operations must be supported both for Gaudi and non-Gaudi applications. Utilities will be provided to write the metadata and to do at least the generic parts of writing the persistent form of the data.

The contents of the metadata will include fields used primarily for selection and others needed for i/o.

Field name	Explanation, Typical contents
Calibration type	E.g., TKR alignment, TKR noisy channel, CAL light asymmetry, ACD pulse height calibration (for purpose of defining Veto threshold) etc.
Flavor	String which may be used to identify calibrations intended for particular applications or analyses. If unspecified, will default to "vanilla".
Serial number	Unique serial number for this record and the corresponding calibration output (TBD: is the serial number unique over all records, just for records for this calibration type, or do we want two serial numbers?)
Software version or data format version	Intent is to supply enough information so that programs wishing to read the data can tell whether or not the data will be readable.
Data format	Refers to medium used for persistent data. May be one of a small list of possibilities, such as MySQL records or ROOT file
Persistent data identifier	Will depend on the format. For ROOT files, this field might be a file spec. For rdbms data, could be an index.
Validity start time	Demarcate time interval for which calibration is known to be valid
Validity end time	Demarcate time interval for which calibration is known to be valid
Procedure completion time	Time the dataset was made
Instrument calibrated	E.g. engineering model, flight instrument, etc.
Calibration procedure level	Possible values will come from a predefined list including 'test', 'development', 'production' and 'superseded'. These terms refer to the procedure being used and say nothing about whether the component being calibrated was deemed acceptable.
Calibration status	Possible values will come from a predefined list including 'OK' and things like 'incomplete' or 'aborted'
Data size	Could be number of bytes, number of records, or something else. Precise meaning will depend on data format and calibration type.
Creator	Could be hardware procedure or software algorithm. Should also include procedure or software version information.
Input description	Name, version, etc. of any input data sets.
Comment field	Anything which might be of interest but doesn't fit in any of the other categories.

Access calibration information during event processing

This is presumed to take place only in a Gaudi process, so Gaudi concepts and tools will be freely used. In the course of analyzing an event, if an application requests calibration data of a particular sort the following sequence will ensue.

By the standard Gaudi mechanism a check will be made to see if the data set already exists in the TDS and, if so, whether its validity interval includes the timestamp for the current event.
If the appropriate information is not already in TDS, the Data Finder (i.e., its proxies) will attempt to come up with it by searching the Calibration Registry. If found, the information in the metadata record will be used to dispatch the correct Conversion Service and Converter. (The Driver will be supplied as part of Calibration Services. Some part of the Converter may have to be supplied by the Subsystem.)
In either case the correct data now is in TDS, where it may be accessed by means of TDS class services.

Fetch a particular calibration data set

Assuming enough information is known to uniquely specify the desired calibration output, the metadata may be fetched directly from the MySQL database by SQL commands or via a C++ utility to be supplied. This utility will not require the Gaudi environment.

From the metadata one can determine how the persistent data is to be read and where to find it.

Per-channel history

This is not easily extracted from the primary data structures (Calibration Registry plus the collection of all persistent data). Rather than recomputing it each time it is requested, we could keep additional derived data indexed by channel (or whatever hardware division is most convenient) and updated whenever a new calibration record (of a calibration type relevant to this hardware) is entered into the Calibration Registry, or maybe only for records meeting certain criteria, such as procedure level = production and status = OK. The updating procedure should be invoked automatically.

Automatic report generation

If the information of interest is contained within the metadata automated report generation will be straightforward. It would be well worth adding a few more fields to the metadata if all summary information needed for standard reports could be captured this way.

Deliverables

From the above it should be possible to enumerate most of them, probably a pretty scary exercise.

Created: 25 Jan 2002, J. Bogart

Last modified: