Spec for MySQL metadata table GUI

What should an interactive interface to the metadata database do?

Currently the calibration metadata database only has a single table with a row per calibration data file; some of the description below makes the assumption that there is only a single table of interest, but the design should allow for multiple tables (some day).

Make it relatively easy to add new correct (syntactically correct and semantically sensible) rows to the main calibration metadata table.
Allow modification of certain fields of existing entries, including at least vend and proc_level.
Maybe automatically update vend of an existing calibration when a new production calibration of the same calib_type, flavor, instrument is added, or at least ask user whether the change should be made.
(lower priority) Display of (subset of?) columns for rows satisfying certain kinds of conditions.

Since the display of existing data in the table may affect content of a row to be added, it should be possible to keep the display of existing data visible while parameters for a new row are being input. Perhaps have a separate window to input selection criteria for display and to contain the display. In fact, might be nice to have more than one display window. Each should have a refresh or go button, causing the query which generated the display to be reissued.

Adding a metadata row

From the second table below, there are a minimum of 10 values to be supplied by the user. Since there is no easy or natural way to enforce an input ordering on the fields, probably should have separate controls for each field and also a go button. At the time go is pushed the input can be checked for completeness and consistency.

Many of the fields typically have values from a known set. For such fields one could use a menu, containing the known possibilities in the order most likely to be convenient for the user, with one additional item for arbitrary typed input. For vstart and vend provide separate controls for year, month, date and time. The first three at least should allow or perhaps require selection rather than typing.

It will be difficult to provide much help for data_ident, typically used to contain the complete file path for the calibration's bulk data. Perhaps when a strategy for storing production files has been implemented some part of the file path will be deducible from some of the other fields, such as calib_type. If so, will need a mechanism to insure the data_ident is filled last, or at least after the fields on which is it likely to depend.

Possible gui scenario

The following is a functional description of how the gui might be used; it doesn't say much about the literal appearance of the gui. It's meant to spur discussion, not to specify once and for all what gets implemented.

Initial inputs

In order to do anything useful, the gui needs

path to an xml file describing the database
information required to connect to or create a physical database

The exact form of #2 depends on the physical realization. For a MySQL database, it consists of user id, password, host, and port. We could put this information into another file, separate from the one containing the database description. Might as well also be XML. Then the two filenames could be arguments to the gui, or the user could enter them after the gui started up.

Create or connect

For now, the only thing we need to support is connecting to the physical database. Once the connection is made, the application should check whether the schema of the physical database is equivalent or at least compatible with the one in the xml file. If it's not at least compatible the application has to abort or ask for a different set of connection parameters and try again.

At some point we might support creation of a physical database, in addition to connecting to an existing database, but this is not very high priority. We could also have some ability to display the database schema (as described in the xml file) before any association with a physical database, but I don't think it's worth the trouble. Let's hope anyone who wants to see what's in the xml description knows enough about xml and databases to just open the file in an editor and look!

Table selection

Once a connection has been made, the user should have access to a list of tables. This should be only the tables described in the xml file, which may be a proper subset of those in the physical database. In the initial implementation, all further operations are restricted to a single table (that is, each operation may mention only one table, but different operations may specify different tables). This is no real restriction (yet) because there only is one table.

Queries (SELECT)

In order to make a query the gui has to know

table name
column names for columns to be displayed (returned)
additional constraints (WHERE clause), if any

As much as possible, for each part of this information the user should be presented with the set of allowable choices to choose from; typing data in by hand should not be required (though maybe it should be allowed for those who prefer it) unless absolutely necessary. So there should be a list of tablenames for the user to choose from. Once the table has been chosen, all column names can be presented so that the user can choose the ones to be returned, and so forth.

For the initial implementation of the gui, additional constraints may be built up of comparisons (the usual arithmetic comparisons =, <, etc.) and the logical operations NOT, AND, OR. Later we might also support EXISTS.

A comparison operation should be selectable (e.g. from a menu or set of radio buttons); we might also want to allow the user to type in a comparison. Since a comparison normally involves at least one column name, the column should be selectable from a list. Once a column name has been selected for one argument to a comparison, if the user wants to supply a constant for the other argument, e.g.

instrument = "EM"

(s)he should be presented with a list of possibilities (for columns of enum datatype or for char or varchar columns which have a list of preferred values in their xml description) and a default value where one exists for the column.

When the user has entered all the information necessary for the query (or thinks so), (s)he will click on something saying "go" or "commit". At this point the entire query should be checked: are all column names legitimate? are values appropriate for the columns they're intended for? And so forth.

Add a new entry (INSERT)

Here the client needs to specify

table
column values for most columns

The gui should keep track of which columns the user is required to specify, which the user may not specify (e.g. auto-increment columns) and suggest defaults or set of enumerated values for columns where these apply. As with the query, after all input is specified user will somehow indicate "go". Remaining validation, if any, of the request can be done at this time.

Update (UPDATE)

Issues for gui are similar to those in previous sections.

Niceties

None of these are essential, especially for a first attempt, but if possible the implementation of the gui should be able to evolve smoothly to include them:

Cut and paste should be allowed wherever they make sense
Display results from a query in new window (or tab); allow multiple results windows to be active concurrently.
Provide browser when user needs to supply filename. However, also must allow typing since user can include environment variables of the form $(something)

What's missing

These involve the underlying utilities at least as much as the gui.

Ability to accept some form of script input and maybe also direct query output to a file. Of course one can use the mysql command line program for this, but then none of the extra verification in rdbModel has been done.
Log a session (at least a line for any command which accesses the physical database).
Handle queries and updates which need to reference more than one table. This isn't an issue now, but, perhaps in some months, I'm thinking about adding another table (to keep track of validity periods of calibrations in a group rather than individually), in which case certain common operations would need to reference both tables.

Columns of metadata table

Table columns, as defined for MySQL, look like this:

**Constituents of a metadata row as SQL sees them**
Field	Type	Default	Extra
ser_no	mediumint(9)	NULL	auto_increment
instrument	varchar(16)
calib_type	varchar(20)
flavor	varchar(30)	vanilla
data_fmt	varchar(10)
data_size	int(11)	NULL
vstart	datetime	NULL
vend	datetime	NULL
enter_time	timestamp(14)	NULL
locale	varchar(30)	orbit
fmt_version	varchar(12)	NULL
completion	enum('OK','INC','ABORT')	NULL
proc_level	enum('PROD','TEST','DEV','SUPSED')	TEST
prod_start	datetime	NULL
prod_end	datetime	NULL
creator	varchar(255)	NULL
uid	varchar(12)
data_ident	varchar(255)
input_desc	varchar(255)	NULL
notes	varchar(255)	NULL

The table below reflects the point of view of the interactive user. We're only interested in those fields which cannot be sensibly set without human intervention. Some, like ser_no, can or must be set automatically to their proper values. Rows for these items are grey. They will not appear at all in the gui. Some others are only informational or unused. Users may set them, but they don't have much effect. These rows are yellow. If not set, they will get the default values appearing in the table. Of the remainder, some, like calib_type, although not enum as far as sql definition is concerned, are typically used as enum, with an option to supply something not in the standard list. This is denoted in the Type column of the table below by specifying enum and putting an ellipsis at the end of a list of specific values. Note Default here means something different than it did in the first table. Default of "none" means the user must supply something or the row will not be added.

**Constituents of a metadata row as the Gui sees them**
Field	Type	Default	Comments
ser_no	mediumint(9)	N/A	auto_increment
instrument	enum('LAT','EM','CU', ...)	(none)
calib_type	enum('TKR_DeadChan', 'TKR_HotChan', 'CAL_Ped', 'CAL_ElecGain', 'CAL_MuSlope',...)	(none)
flavor	enum ('vanilla', 'ideal', ...)	(none)
data_fmt	enum('XML', 'ROOT',...)		(none)
data_size	int(11)	"unknown"	This field is at most informational, currently entirely unused
vstart	datetime	(none)	Must be <= vend
vend	datetime	(none)	Must be >= vstart
enter_time	timestamp(14)	N/A
locale	enum('orbit', 'NRL', 'SLAC', ...)	(none)
fmt_version	varchar(12)	"unknown"	Currently this information isn't used, but someday it will be. No ideas so far on how to help user with this one.
completion	enum('OK','INC','ABORT')	(none)
proc_level	enum('PROD','TEST','DEV','SUPSED')	(none)
prod_start	datetime	NULL	Set automatically to value of enter_time if dataset proc_level = 'PROD'
prod_end	datetime	NULL	Always initially NULL
creator	varchar(255)	N/A
uid	varchar(12)	N/A
data_ident	varchar(255)	(none)
input_desc	varchar(255)	"No input desription submitted"
notes	varchar(255)	"No notes submitted"

Last modified:
Joanne Bogart