Troubleshooting system-level problems

System crashes

A ``system crash'' is the system going down without unmounting filesystems and doing other cleanup operations. This is also called an ``abnormal shutdown.'' Three types of system crashes occur:

The system ``panics'' when it encounters a hardware problem or kernel inconsistency that is so severe that the system cannot continue functioning.

If the power to the system fails even briefly, the system crashes.

operator crash
If the system ``hangs'' due to operator error, you usually need to reset the system and reboot it to solve the situation. Recovering from an operator-induced crash is similar to recovering from a system panic.

Note that a system that has NFS-mounted files from a system that is down may behave as if it is hung; in that case, rebooting the other system or, if possible, unmounting the filesystems from your system will usually solve the problem.

When the system is shut down normally, the shutdown(ADM) program stops all daemons, kills the active processes, unmounts any mounted filesystems, runs the sync(ADM) command, and tells init(M) to bring the system down to the appropriate state (either single-user or ``safe to power off''), or to reboot.

If the system goes down before this shutdown procedure completes, filesystems may be corrupted, resulting in lost data. Some data may be lost because the buffer cache was not flushed to disk. The operating system flushes the buffers to disk fairly regularly, so the amount of data lost due to an abnormal system shutdown should be minimal. However, filesystem corruption is a common problem. If the root filesystem is corrupted, the system may not function properly.

When the system crashes for any reason, record the relevant data in the system log book and then reboot the system.

Next topic: Recovering from a system panic
Previous topic: Restoring a corrupted root filesystem

© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003