A scheme for calibration data tracking

Background

There are at least two unsolved problems w.r.t. the Cal calibration "pipeline": that is, the sequence that needs to transpire from taking calibration data (charge injection or muon) through making it available to SAS by registering it in the SAS Calibration database.

Proposal

There are two parts. One is to add two new tables to the SAS calibration database. The other is to provide program- or script-callable support for filling appropriate entries into these tables and for making a provisional entry in the main SAS calib. database table. Probably we'll also want to add a new column (I think called a "foreign key" in rdbms lingo) to the existing (main) calibration table.

New tables

There would be three (thanks to Marco for suggesting improvements to my original scheme, which had 2). One table (call it Input_Files) would have a row per file (typically they're digi ROOT files) input to calibGenCAL. Another, Output_Files, would include all calibGenCAL output files which ultimately get registered as calibrations. Columns for each of these tables would include unique id (an auto-increment serial number), file name, Online run number, subsystem (always CAL here, but Tracker or ACD might also make use of these tables some day) and whatever else is deemed useful; for example, Input_Files might also start and stop times for data taking.

The third table (call it File_relations) would keep track of relations among these files. It would have an entry for each pair of files such that one was an input source for the other. Its columns would be

Any file, input or output, could appear multiple times in the File_relations table.

Do we believe that only one level of input-file-to-output-file is interesting? If not, I don't think it changes the structure of the File_relations table any; it would just extend the back-trace procedure.

The new tables should be accessible from rdbGui without much work: no code at all, probably, and only straightforward additions to a configuration file. I would be inclined to make a small enhancement to rdbGui and underlying stuff so that, from rdbGui, the new tables would be read-only.

Filling the tables

After or as part of a calibGenCAL run, entries would first be made in the Input_Files and Output_Files table. The process making the entries would keep track of the ids assigned. For example, that part of calibGenCAL which makes CAL_Asym would insert into the Files table entries for any files it uses as input (checking that they haven't already been inserted, if this is a possibility) and also an entry for its output file. Then, using the unique ids for those files in the Files table, it would make appropriate entries in File_relations.

Finally, it would create a new row in the main calibration table, but with at least one value set so that the calibration will not be used for production work without further modification to the row. For example, one could set flavor = "provisional" or or proc_level = "TEST". If we add the aforementioned foreign key column to this table, it would be filled with the key from the Output_Files table for the file in question. Before the file could be used for production work, a human being would have to validate it and set flavor or proc_level to standard values.

Tracing back to the source

Using the foreign key in the main calibration table, look up all rows in the File_relations table containing it; then use foreign keys belonging to the OUTPUT columns to look up files in the Input_Files table.

Last modified: