GLAST/LAT > DAQ and FSW > FSW
Introduction to FSW Memory Leak Handling
|
The FSW group has adopted a strategy that makes memory leaks, if not impossible, at least very traceable. The rules of the game are:
AllocatorsTo facilitate this philosophy, PBS (Processor Basic Services) provides a couple of memory managers: FPA (Fixed Packet Allocator) and RNG (Ring Buffer Allocator). Others will be added, as needed. Typically, during task initialization, one uses malloc to allocate a chunk of memory which is then handed over to FPA or RNG to be managed. On task shutdown, these resources are returned. Given that startup and shutdown are rare and well-defined events, memory leaks via the malloc route should also be easy to track down. Admittedly, in this case, the real trick is realizing that there is a problem. Drivers, the Exception to the RuleFor performance reasons, drivers must loan their memory out to (or borrow it from) other tasks. FSW tries two tactics to ameliorate this problem:
The latter approach is preferable, but not always practical. The LCB driver never allocates any packets, but it provides other tasks with pointers to memory locations in a shared ring buffer. So, sharing is indeed going on. Unfortunately, for efficiency reasons, we must live with this. This philosophy is (or soon will be) more successful in the 1553 driver. The driver will read the message from hardware and, after verifying the message's integrity, call a task-supplied memory-allocation routine. If the allocation routine succeeds, the driver will copy the message into the task's memory, then dispatch the message to the task. If the task servicing a given APID has squandered its memory (or gotten behind in its processing), messages to it may get discarded. Tasks servicing other APIDs, however, will not be impeded. Thus, the 1553 implementation recovers some of the original philosophy. Bad PointersGiven that we are working in a real-time environment, there is no way to totally guard against bad pointers. Any piece of code that first checks a pointer for integrity and then uses it in a non-interlocked fashion always has a hole between the checking and the usage. Doing it "correctly" would be tantamount to writing a single-threaded piece of code. That does not mean to imply that anything not 100% effective is worthless; just don't believe that there is a "magic bullet". Better checking by users would help, but the cure could be worse than the disease:
ptr = access_control_block ();
if (ptr == NULL) return BAD_POINTER;
Although this has some value,
the amount of area this covers
(in the space of possible errors) is small.
A better check is to plant some integrity information
in the object being accessed.
Unless you have some reason to believe
that access_control_block can return a NULL
as part of its normal course of doing business,
this check is next to worthless.
Any other bad value it returns is the result
of some overwrite/corruption problem
that this test provides no protection against.
Something like:
ptr = access_control_block
if (ptr->self_pointer != ptr) return BAD_POINTER;
is much better.
If ptr is bad,
or the structure it references has been corrupted,
this is a much stronger (but still imperfect) test.
There is an old expression, "don't check for error that you don't plan on handling". There is some truth in this statement, but it is misleading. One should separate error detection from error correction. Detection is a worthwhile activity, even if you don't know what to do (except halt the system). At least you can stop the problem as close to its origin as possible. Allowing the system to continue is like spreading a disease. Sooner or later, some innocent victim is going to use a bad piece of information, possibly corrupting one of their data structures. Tracking this mess back to its origins is, well, a mess. |