Earthworm effort began as a grass-roots effort by developers. As such, its spirit was to focus on producing high-quality code and to minimize meetings, rules, and procedures. However, success brings problems; as the number of contributors and users has grown, so has the need for a stated set of operating procedures. Thus, the following is offered in the spirit of "A loose consensus and some working code."
The Earthworm effort has several objectives: first, it is to provide a rapid response system suitable for critical monitoring applications; second, it is to operate as a vehicle to integrate the products of various seismic installations into a common software package available to all. The first objective implies that the system be robust and reliable, which, in turn requires a closely knit organization to provide rigorous standards, testing, and rapid bug-fixing. It further requires that the system be maintainable, and be suitable for use at a variety of installations, including those with modest levels of resources. The second objective leads toward a policy of open inclusion of various offerings, produced for a variety of purposes and operating environments, and therefore engineered to varying degrees of robustness and reliability.
In response to these needs, an Earthworm Central has evolved, which maintains the Earthworm software, accepts contributions, develops code, produces documentation, and releases Earthworm versions. This group is also responsible for quality assurance and bug fixes. There are currently three rough categories of software within the Earthworm effort: core, contributed, encapsulated.
The core software is intended to meet the requirements of the mission-critical objectives. The focus here is to maintain the quality of the core software in terms of reliability, maintainability, robustness, and longevity. This, in turn, comes down to issues like portability, failure modes, and error-detection, -processing, and -recovery. Core software is modified as needed under the control of Earthworm Central to fix errors and provide enhancements. The distribution system consists mainly of numbered releases and patches of various degrees of formality, depending on the urgency of the fix.
The Contributed software consists of ancillary programs submitted for inclusion with the Earthworm distribution, but which, for whatever reason, don't fit into the core category. These are distributed as is. An index and descriptions of these programs will be maintained.
A few exotic codes belong to the encapsulated category. These are part of the core offering, but are maintained by the original authors rather than Earthworm Central, either due to the complexity of the algorithm, or because they interface to other systems which may be changing. Examples include hypo-inverse and the 'coupler' package to the NOAA tsunami warning system. The approach is that the author, or the author's institution is responsible for the quality and maintenance of the code.
Anyone is welcome to create and contribute software. As mentioned above, most any relevant software will be accepted into the distribution as contributed. It is only requested that source code, some documentation, and a link to the author be provided. Core software is usually created or solicited in response to user needs. The objective in such cases is to offer the highest-quality code, in terms of the above requirements, in the most timely manner possible. After it is acquired it is normally reviewed and released to selected sites for testing. Any required changes as a result of testing and review will be communicated to the author. Such changes may then be implemented either by the author or by others as dictated by schedule and available resources.
Modifications to contributed software are on request by the author. The author may simply request to replace the software currently in the distribution with a new version, and it will be replaced on the Earthworm ftp site.
New versions of encapsulated software are generally accepted as they are produced, and released by various methods as required by the urgency of the situation. Any observed malfunctions are reported to the author.
Since the performance of core software is the responsibility of Earthworm Central, changes to core are made under its control. Reported bugs and deficiencies are discussed, and implementation of the fix is assigned, reviewed, and incorporated by Earthworm Central as required. Enhancements produced by others will similarly be evaluated and inserted by Earthworm Central.
Coding standards is a noxious and intrusive idea which invades a developer's creative privacy (limited as it is), stifles innovation, and destroys morale. At best it is ignored; at worst it incites a counter-productive reaction. Yet in order to have any hope of having the system be portable, maintainable, and mission critical, some common conventions are needed. Thus the intent here is to state coding objectives rather than standards, and to explain traditional practices and conventions as they have (not necessarily as they should have) evolved within the Earthworm group.
One module, one function: We've found (the hard way) that the idea that 'a module should do only one thing' to be extremely important. It's more expedient to write one module to do several related functions, but the result is a complex module with numerous switches and options, and a maintenance and stability problem, in that enhancements to one of the functions may affect the others. Single-function modules, on the other hand, results in code which is simpler to understand and maintain. Separate, similar modules may lead to identical code in multiple modules. The solution is to place such code into utility functions, and place those functions into the utility library (/src/libsrc/util).
One input, one output: In principle, a module can connect to any number of transport rings and use any number of 'back-door' communication schemes. However, the idea of standard-in and standard-out (one input ring, one output ring) has merit. It is the basis of the 'erector set' feature of Earthworm, which allows users to assemble custom systems. In practice, we've found that modules with multiple input and output streams quickly lead to reduced flexibility. Other than performance, there's no harm in a module dumping various kinds of messages onto one output ring, and contemporary hardware can easily support very high traffic on transport rings.
OS Kernel functions: Given our limited resources, the principle is to run on the two most dominant platforms of the day; currently, these are NT and Solaris. To date, Earthworm has survived five operating systems. In the process, the tradition has developed of using wrapper routines for system-specific calls, and producing different versions of such routines for each operating system. Such routines are kept in system-specific libraries (currently .../src/libsrc/solaris and .../src/libsrc/winnt), and the correct library is specified at link-time via environment variables. Thus, for example, the routine sleep_ew() wraps the NT "Sleep()" call, and the Solaris "nanosleep()" call, and modules which use the sleep_ew() function can run on either system.
To preserve this, of course, implies that wrapper routines will be produced as needed.
Start with a template. The Earthworm architecture imposes an overhead burden on a module. This includes connecting to transport rings, reading and writing messages, reading the parameter file, error logging, etc. We've found that the most painless way of coding this is to start with an existing module which is similar in structure to the module to be written, and to modify it as needed. Another approach is to use the "template" module in /src/diagnostic_tools/template. This tends to reduce these tasks to cut and past operations, and produces code which is easy for others to maintain.
Earthworm utilities: /src/libsrc/util/ contains various utility routines such as message parsers and format generators. Using these can save much tedious effort, and aids portability.
Error reporting. This is best appreciated by those who get stuck installing and maintaining Earthworm systems. A major frustration is the situation when a module which exits with no error message, or a message which not meaningful to the people who must maintain the system. This occurs most often during configuration, when the parameter files are being created and debugged. A shocking amount of installation time can be spent resolving such problems. A more serious case is when a module exits during run-time because of an unusual asynchronous condition (e.g. receiving an oversize message) without adequately reporting the cause. If such events are rare, finding the problem can become extremely difficult and the consequence of such failures is potentially very serious.
"Works as long as there are no earthquakes". There are numerous horror stories of systems which had performed well for long time, and failed when a major earthquake occurred. Some classic problem areas include:
Memory leaks. There have been modules which passed various tests, but which caused the system to hang after weeks of running by slowly draining available memory. This, plus the event-driven failure mode above, makes run-time memory requests a very dangerous practice. It is far better to do all malloc()'s only at start-up time and 'waste' memory, rather than crashing the system later.
Given the 'community' objective of the Earthworm distribution, it is crucial that the code be easily understood, modified, and maintained by others. People with various skill levels and available time should be able to understand and modify the distributed code. Considerations here include: