Core Minutes 4/8/2014
ScienceTools: (Jim) There are three items since the last meeting:
See Science Tools Development Notes for details.
FSSC: (Richard) The new ST tag FSSC requested is in their hands. The new FT2 files (FT2SECONDS) have also been release.
Oracle servers: (Richard) They're on their last legs. Recently the primary server crashed and had to be rebuilt; it is now the back-up server. Meanwhile, Brian is working on Pipeline updates. He needs a live server (with non-trivial data) to finish testing, hopefully by the end of April. We're thinking about borrowing a server for him to use in his development work while proceeding to replace the production servers.
Archiver: (Tom G.) The venerable Archiver, which moves data to HPSS tape, stopped working a couple weeks ago. As Steve Tether has pointed out, ISOC depends on it so we need to maintain it. It's up now and various related issues have been addressed. There are still some mysteries.
The immediate cause of the stoppage was probably the decommissioning of glastlnx20 (its functions have been taken over by fermilnx-v03). The situation was complicated by the fact that we have three active cron-like systems with somewhat different characteristics: cron, trscron and a home-grown tool Navid wrote. After some detective work it was evident that the Archiver function had been assigned to glastlnx20 so needed to be moved. There are other entries in one or more of the crons (e.g., for the old CMT release manager) which, at this point, can only cause trouble if they're active. Clean-up is in progress.
Other forms of clean-up were by-products of the Archiver gambit. The glast account had received o(100,000) emails. Since no one logs in, they just kept accumulating. Certain kinds of errors result in rows being written to a db log table. There can be very many such errors. (Richard) Could Nagios help with the email problem? (Tom) Yes, it could monitor the size of the mailbox file. Would be nice if it could also keep track of the db log table.
Reprocessing: (Leon) Need to redo merit files. This can be done either with Gleam or TMine. The first seemed too daunting so we're leaning towards TMine. Eric and Alex needed to fix up some things in TMine, now done. (Tom G.) A task to be used for testing (creates some output to be examined, but is not configured for production) is ready to go. Waiting on the new TMine. Initial trials (with TMine 3.3.4) show that TMine is surprisingly slow: about one million events per hour of CPU! Gleam is faster. (Richard) How long is the production task likely to take? (Tom) Assuming we have a thousand machines at our disposal, a few days.
(Richard) Another TMine note: Alex, now in Chicago, took part in a dark energy challenge to distinguish stars from galaxies. His TMine entry won!
Mountain Lion (Tom S.) Still digging to understand what the differences are between our environment and FSSC. We're ostensibly using the same compiler, but it's not really 100% the same. For example, our build pulls in different includes. One thing he might try is to use the vanilla gcc available via MacPorts.
lsf (Richard) The Computer Center is pushing to get rid of lsf on rhel5. We still need to build L1 releases on rhel5. Will this upset RM? Can we move to Jenkins? (Tom S.) We're already using Jenkins for rhel5 32-bit builds. It would not be especially difficult to move builds for all OSes to Jenkins; we just need a dedicated box running each OS of interest. We're still using lsf to launch Jenkins jobs. That could be changed if necessary.
Announcements
Opening Day!
|
|
minutes index
|
next
|