|
The FSW group's design philosophy is based on a few key concepts:
- Risk Management
Several design decisions minimize the ability of the FSW,
however aberrant, to endanger the mission:
- active monitoring
Each processor is monitored by a hardware watchdog circuit.
If this is not "refreshed" on a timely basis,
it will force a reboot of the procesor.
In support of this function,
the FSW Watchdog task performs periodic surveys of the other tasks;
if any task fails to respond properly,
no refresh message will be issued.
- limited responsibility
Although the FSW controls some environmental settings
(e.g., temperature), only fine control is being managed.
All safety limits are maintained
by the Spacecraft hardware and software.
- prophylactic rebooting
If the FSW encounters a serious processing error,
it does not try to continue.
Instead, it reboots the processor involved,
saving a memory image for diagnostic evaluation.
This minimizes the effects of "bit flips" in RAM, etc.
- self-protecting hardware
The LAT hardware is designed to be self-protecting;
nothing that the FSW does should be able to harm it.
Similarly, the spacecraft and GBM are not supposed
to honor FSW requests that would put them at risk.
- Flexibility
Both the LAT instrument and the FSW are designed
for extreme flexibility:
- instrument configuration
The LAT instrument contains more than two million bits
of configuration data, stored in ~100,000 configuration "registers".
These can be used to disable sensors,
compensate for changing component characteristics, etc.
Various sets of configuration settings can be used
to achieve desired scientific results,
manage environmental demands, etc.
A fresh configuration is loaded before each observation session
and dumped at the beginning and end of the session.
This gives ground-based engineers the ability
to analyze how the configuration
(including any mid-session changes)
might have affected the observed data.
- software modification
Aside from the Primary Boot Code,
the FSW can be updated or replaced
(e.g., from ground-based telemetry or another processor).
This can be used to compensate for hardware failures,
repair late-surfacing software bugs,
or institute entirely new behavior.
- Hardware Redundancy
Many portions of the LAT hardware are duplicated,
allowing errors to be detected and new hardware to be swapped in.
For example, there are spare EPU and SIU processors,
a spare GASU, and hardware correction and detection for all memory.
- Parallel Development
The FSW group's suite of code management tools
improves its ability to work in parallel.
An engineer can develop and test code changes,
using production-quality (but limited-capability) versions
of other programmer's software.
As new features become available,
they can be "published" in Development and/or Production versions.
These design decisions greatly simplify the lives
of the FSW engineers.
Because they do not have to concern themselves
with complicating factors
(e.g., mission-critical issues,
(fail-soft) performance in the face of failures),
they are freed to use simpler designs.
This, in turn, leads to speedier development
and more reliable software.
|