Troubleshooting system-level problems

Checking error messages

In most cases, the PANIC error message is displayed on the console (line 8 in the example). Look in the /usr/adm/messages file for other messages that preceded the panic and may contain valuable information about what went wrong.

Kernel error messages report driver errors and errors in other parts of the kernel such as the process scheduling subsystem and the file subsystem. Monitoring these message regularly is an important step in preventing serious system problems; studying these messages after a system problem develops is an important part of troubleshooting the system. Reading the PANIC error message can give valuable information about the cause of the system failure.

Kernel error messages usually have the format:

   class: [ddname:] [routine] message
class is usually one of the following: CONFIG, NOTICE, WARNING, FATAL, or PANIC. A number of these messages are documented on the messages(M) manual page.

Messages that have INIT for class are generated by the init(M) process and are documented on the init(M) manual page. These messages usually occur during system initialization. A few of the init messages indicate error conditions, but many more are just informational messages.

ddname names the driver or subsystem having problems. The actual peripheral is usually identified by a pair of numbers of the form major/minor. This identifies the device number of the peripheral where the error occurred. The routine element indicates the subsystem that detected the condition; these portions of the error message are included mostly to help support staff in tracking difficult system problems.

PANIC messages are not usually logged in the /usr/adm/messages file, but other messages that occurred before the panic are usually logged and can provide valuable information about the cause of the panic. The PANIC message is usually displayed on the system console or can be viewed with the panic function of the crash(ADM) command.

In the example console dump on ``Console panic information'', the PANIC message does not follow the standard format. This usually means that the error is from the main kernel code and not a driver. In this case, the message includes a definition of the type of trap that caused the panic. The meaning of these trap numbers is defined in the /usr/include/sys/trap.h file and documented on the messages(M) manual page. In this case, it is a paging violation, which usually occurs when an errant pointer is dereferenced in a driver or other kernel code.

Next topic: Checking system files with error histories
Previous topic: Getting the EIP number

© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003