During event processing, the right calibration data
From the above it's clear that we need to deal with two kinds of information: the calibration data proper, and information about a particular calibration data set, such as its period of validity. The latter will be called the metadata for the data set. The entire collection of metadata resides in the Calibration Registry. Metadata include, but are not limited to, criteria used in searching for a particular data set. Metadata fields are sufficiently generic that nearly all of them will be meaningful for all or almost all calibration types. All of this (highly structured, searchable) argues for keeping the metadata in a relational database. Current plans are to use MySQL, which has several desirable properties, including some advantages over Oracle.
Calibration data, as opposed to metadata, come in many forms. Output from a calibration procedure may be small or large, of fixed or variable size, more or less structured, depending on the procedure and the state of the intrument at the time of the calibration. A relational database representation would be suitable, perhaps optimal, for some procedures, but not others. Fortunately there is no requirement or even preference that all calibration data be stored the same way. It would be sensible to stick to a small number of supported formats since software resources are not infinite. Two or three (ROOT file, XML file, MySQL table??) ought to be enough to avoid any serious awkwardness in representing and accessing the data.
A Gaudi data service, as described in the Gaudi Developer's Guide, will be implemented for calibration data. Such a service handles all the details of finding the right data set and insuring that it is available in an in-memory representation (in Gaudi terminology, the Transient Data Store, or TDS). The client application accesses the data exclusively via TDS classes; it has no need to know anything about the persistent form of the data, nor does it need to know how to find a particular data set among several of the same kind.
Components of the Calibration Data Service will include
The Data Finder makes use of the remaining classes as needed to get the data into the TDS without any intervention from the client.
Implicit in several of the high-level requirements is the assumption that the Calibration Services will be available in any environment in which clients might run, be that at SLAC, at another collaborating institution, or at 30,000 feet on a laptop. (This most likely excludes archiving or any other applications involving writes.) Two techniques will be employed to achieve this:
Network access is normally preferred when feasible. We expect the calibration data store, broadly speaking, to contain only two kinds of objects visible to applications: MySQL data and (maybe) physical files, such as ROOT files. MySQL has a client/server architecture in which network access is transparent except for possible performance degradation if the network connection is poor. Physical files can be made readily network accessible for reading by, for example, keeping them in a dedicated anonymous ftp area.
In case there is no network connection or only a poor one, users may copy needed files and extract parts of the MySQL database offline. They would have to maintain their own local MySQL server and pool of data files. Client programs would have to be told, perhaps via Gaudi job options, to use the local server and local files. (Or, for a really minimal system with just one calibration data set per calibration type, could probably dispense with MySQL altogether..)
The principal use cases correspond to the Requirements.
There are two parts to archiving a calibration data set: create the persistent form of the data in one of the supported formats, then register the data set in the Calibration Registry. How the persistent form is created will vary, depending on the calibration procedure. Registering the calibration just consists of adding a row (the metadata) to a MySQL table. Both operations must be supported both for Gaudi and non-Gaudi applications. Utilities will be provided to write the metadata and to do at least the generic parts of writing the persistent form of the data.
The contents of the metadata will include fields used primarily for selection and others needed for i/o.
| Field name | Explanation, Typical contents |
|---|---|
| Calibration type | E.g., TKR alignment, TKR noisy channel, CAL light asymmetry, ACD pulse height calibration (for purpose of defining Veto threshold) etc. |
| Flavor | String which may be used to identify calibrations intended for particular applications or analyses. If unspecified, will default to "vanilla". |
| Serial number | Unique serial number for this record and the corresponding calibration output (TBD: is the serial number unique over all records, just for records for this calibration type, or do we want two serial numbers?) |
| Software version or data format version | Intent is to supply enough information so that programs wishing to read the data can tell whether or not the data will be readable. |
| Data format | Refers to medium used for persistent data. May be one of a small list of possibilities, such as MySQL records or ROOT file |
| Persistent data identifier | Will depend on the format. For ROOT files, this field might be a file spec. For rdbms data, could be an index. |
| Validity start time | Demarcate time interval for which calibration is known to be valid |
| Validity end time | Demarcate time interval for which calibration is known to be valid |
| Procedure completion time | Time the dataset was made |
| Instrument calibrated | E.g. engineering model, flight instrument, etc. |
| Calibration procedure level | Possible values will come from a predefined list including 'test', 'development', 'production' and 'superseded'. These terms refer to the procedure being used and say nothing about whether the component being calibrated was deemed acceptable. |
| Calibration status | Possible values will come from a predefined list including 'OK' and things like 'incomplete' or 'aborted' | Data size | Could be number of bytes, number of records, or something else. Precise meaning will depend on data format and calibration type. |
| Creator | Could be hardware procedure or software algorithm. Should also include procedure or software version information. |
| Input description | Name, version, etc. of any input data sets. | Comment field | Anything which might be of interest but doesn't fit in any of the other categories. |
This is presumed to take place only in a Gaudi process, so Gaudi concepts and tools will be freely used. In the course of analyzing an event, if an application requests calibration data of a particular sort the following sequence will ensue.
Assuming enough information is known to uniquely specify the desired calibration output, the metadata may be fetched directly from the MySQL database by SQL commands or via a C++ utility to be supplied. This utility will not require the Gaudi environment.
From the metadata one can determine how the persistent data is to be read and where to find it.
This is not easily extracted from the primary data structures (Calibration Registry plus the collection of all persistent data). Rather than recomputing it each time it is requested, we could keep additional derived data indexed by channel (or whatever hardware division is most convenient) and updated whenever a new calibration record (of a calibration type relevant to this hardware) is entered into the Calibration Registry, or maybe only for records meeting certain criteria, such as procedure level = production and status = OK. The updating procedure should be invoked automatically.
If the information of interest is contained within the metadata automated report generation will be straightforward. It would be well worth adding a few more fields to the metadata if all summary information needed for standard reports could be captured this way.
From the above it should be possible to enumerate most of them, probably a pretty scary exercise.
Created: 25 Jan 2002, J. Bogart
Last modified: