Tuning memory resources

Case study: memory-bound workstation

In this example, a user is given a workstation running the SCO OpenServer Desktop to use. The machine has had one previous user who may have made undocumented changes to the system's configuration. The workstation's new owner is given the root password and is made responsible for its day-to-day administration. The performance of the machine seems generally adequate although it does become noticeably slower when several different applications are started at the same time.

System configuration

The configuration of the system is:

Uniprocessor 80486DX running at 33MHz.
ISA bus.
24MB of RAM.
48MB of swap space.
One 434MB IDE disk drive
One 16-bit Ethernet card with a 16KB buffer.
VESA bus graphics adapter with 1MB of Dynamic Random Access Memory (DRAM).
SVGA monitor configured to run at a display resolution of 1024x768 with 256 colors.

The user's home area and applications are accessed via NFS-mounted filesystems on a file server maintained by the company's Information Services department. The local area network is lightly loaded. There are occasional bursts of NFS traffic when users access remote files. There are no funds available for upgrading the workstation.

Defining a performance goal

The user wishes to become familiar with how their workstation has been set up and to improve its performance if possible. They have only a few hours available to perform this task.

Collecting data

The user collects the following settings for kernel parameters by running the Hardware/Kernel Manager:

GPGSLO: 200 -- the number of memory pages available when the page stealing daemon, vhand, becomes active to release pages for use.
GPGSHI: 300 -- the target number of pages for vhand to release for use.
MAXUP: 25 processes per user are allowed.
NBUF: 3000 1KB buffers are requested at system startup.
NHBUF: 256 buffer hash queues are reserved.
NSTREAM: 256 stream heads are configured.
NSTRPAGES: maximum 500 4KB pages of memory can be dynamically allocated for use by STREAMS message buffers.

The user examines the file /usr/adm/messages and notes the following information:

The kernel requires 7MB of memory.
The buffer cache occupies 3MB of kernel memory.
There are 17MB of memory available for user processes.

In addition the user notes the following facts from various configuration files on the system:

The TCP/IP interface definition for the Ethernet card is defined to use back-to-back packets and full frames in the file /etc/tcp.
Four NFS biod daemons are configured to run in the file /etc/nfs.

Next, the user starts up all the usual applications that they run on the desktop -- scomail, xclock, editing files in two windows, browsing the Web and viewing documentation with DocView, running two sessions on remote machines, and running a word processor and spreadsheet. They are unable to start any more processes than this and a message is displayed to this effect. They then switch to another multiscreen, log in as root, and start to record sar data at 30-second intervals to a temporary file:

sar -o /tmp/sar_op 30 120

They then continue to use the system for an hour before examining the results.

Formulating a hypothesis

There are several things that immediately strike the user as less than optimal about this system configuration:

Only 25 processes are available for each user even though this system only has one user. This is probably the reason why only a limited number of windows could be started.
The number of hash queues configured (NHBUF) is much lower than the number of system buffers (NBUF). There is approximately 1 hash queue for every 12 buffers; this means than each queue will contain 12 buffers on average. On a single processor system, the system will automatically allocate at least one hash queue for every two buffers if NHBUF is set to 0.
The user suspects that too much memory is allocated to buffers that could more usefully be allocated to user processes. Most disk access is remote and will cause most loading on the NFS file server. NFS remote writes to files are write through (synchronous) and not cached on either the client or the server machines. Remote reads are cached locally and have unknown requirements. As most of the applications that the user runs are idle while not being used, it is unlikely that the STREAMS subsystem is severely loaded.

Getting more specifics

The sar -u report is extracted from the file /tmp/sar_op:

sar -u -f /tmp/sar_op

This report shows the system's usage of the CPU:

   09:00:00    %usr    %sys    %wio   %idle
   ...
   09:15:30       6       5       1      88
   09:15:00       5       4       1      90
   09:16:30       6       3       0      91
   ...

It is apparent that the system spends most of its time idle with plenty of spare processing capacity. The low waiting on I/O (%wio) figures do not indicate any bottleneck in the I/O subsystems.

Memory investigation

The user next runs sar -r to examine the system's usage of memory:

   09:05:00 freemem freeswp
   ...
   09:15:30     314   73166
   09:16:00     302   72902
   09:16:30     308   72888
   ...

This shows that there is plenty of swap space (freeswp) but that the system is running low on physical memory (freemem is close to GPGSHI) so the page handling (vhand) and the swapper (sched) daemons may be active. (See ``Tuning memory resources'' for more information about the circumstances under which these daemons become active.)

The sar -q report shows that no runnable processes are swapped out (no value is displayed for swpq-sz):

   09:05:00 runq-sz %runocc swpq-sz %swpocc
   ...
   09:15:30     1.3       2
   09:16:00     1.0       3
   09:16:30     1.1       2
   ...

Running sar -w, the swpot/s field is greater than zero; this is evidence that the system is swapping out to the swap area:

   09:05:00 swpin/s bswin/s swpot/s bswot/s pswch/s
   ...
   09:15:30    0.05     0.2    0.01     0.2      72
   09:16:00    0.07     0.5    0.02     0.4      55
   09:16:30    0.03     0.2    0.01     0.3      43
   ...

The system does not appear to be very short of resources apart from memory for user processes. There is plenty of spare CPU capacity and no immediately apparent problem with I/O.

It is possible that a user process is grabbing too much memory for itself. In this instance, running the command ps -el shows that no process has a swappable virtual memory size (SZ field) greater than the X server, and most are much smaller. When tuning a system, it is always worth checking to see which processes are using most swappable memory (SZ field) and most time (TIME field).

The next step is to see if the amount of memory used by the buffer cache can be reduced.

I/O investigation

The user runs sar -b to investigate buffer cache hit rates:

   09:05:00 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
   ...
   09:15:30       1      17      97       6      18      68       0       0
   09:16:00       1      25      95       1       3      67       0       0
   09:16:30       1      18      96       3       9      67       0       0
   ...

The hit rates are quite high at approximately 96% for reads and 67% for writes. The numbers of blocks being transferred is quite small. As most of the files being accessed are remote, local disk activity should be low apart from paging in of program text and data, and any paging out activity. This is investigated using sar -d:

09:05:00 device %busy avque r+w/s blks/s avwait avserv ... 09:15:30 wd-0 1.42 2.39 2.13 6.82 7.50 10.42

09:16:00 wd-0 2.03 2.97 1.37 3.64 7.00 13.78

09:16:30 wd-0 1.16 2.29 1.70 5.95 9.48 12.23

...

The disk appears not to be busy so it should be able to cope with a decreased cache hit rate. If memory is released by decreasing the size of the buffer cache, this may also lessen any paging out activity and so decrease disk activity.

STREAMS usage investigation

Finally, the user runs netstat -m to investigate how STREAMS is using memory:

   streams allocation:
                           config   alloc    free     total     max    fail
   streams                    256     113     143      6270     124       0
   queues                     566     394     172     16891     404       0
   mblks                      271     102     169    179326     283       0
   buffer headers             442     391      51    155964     475       0
   class  1,     64 bytes    1288     276       8     50289    1288       0
   class  2,    128 bytes     796     171      25     18668     796       0
   class  3,    256 bytes     364      50      14      9174     364       0
   class  4,    512 bytes     132      12      20      3334     132       0
   class  5,   1024 bytes      54       5       9      1904      54       0
   class  6,   2048 bytes      84      62      22      1622      84       0
   class  7,   4096 bytes       8       8       0       293       8       0
   class  8,   8192 bytes       1       0       1       113       1       0
   class  9,  16384 bytes       1       0       1        21       1       0
   class 10,  32768 bytes       0       0       0         0       0       0
   class 11,  65536 bytes       0       0       0         0       0       0
   class 12, 131072 bytes       0       0       0         0       0       0
   class 13, 262144 bytes       0       0       0         0       0       0
   class 14, 524288 bytes       0       0       0         0       0       0
   total configured streams memory:2000.00KB
   streams memory in use: 205.29KB
   maximum streams memory used: 686.34KB

This report shows that the kernel's peak usage of memory for STREAMS was about 700KB. Its current usage of physical memory is about 200KB which is well below the maximum 2MB of memory that can be dynamically allocated.

The usage of stream heads reported in the streams column shows that the 256 configured for use are sufficient. The number configured could be reduced using the NSTREAM parameter but this releases only 80 bytes of memory per stream head.

Making adjustments to the system

Firstly, the user increases MAXUP to 128 as this is only a configuration limitation. They will now be able to run many more processes than before.

To release more memory for use by user processes, they reduce the memory allocated to the buffer cache to 1MB by setting NBUF to 1024. The number of hash queues, determined by the value of NHBUF, is increased to 512 -- that is, half the value of NBUF.

After making these changes, the kernel is relinked, and the system rebooted. The new size of the kernel has decreased by approximately 2MB to 5MB. This releases 2MB of memory for user processes.

The user continues to monitor the system in everyday use, particularly noting the impact of the changes on memory, buffer cache, and disk usage.

Next topic: Case study: memory-bound software development system
Previous topic: Kernel parameters that affect the X Window System

© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003