his chapter describes how the Netscape Proxy Server caches documents. It also describes how you can configure the cache by using the online forms and how the cache directory structure is maintained automatically by the cache monitor and cache manager.
Proxy document retrieval
Dispersing files in the cache
The proxy server uses a specific algorithm to determine the directory where a document should be stored. This algorithm ensures equal dispersion of documents in the base directories, so the directories contain a small and nearly equal number of documents. Equal dispersion is important for two reasons:
NoteTo set cache specifics:
Setting the specifics for a large cache is time-consuming and may cause the administration interface to time-out. Therefore, if you are creating a large cache, use the command line utilities to set cache specifics.
Note
The proxy does not have to record URLs to function properly. This feature exists so that the proxy administrator can view which URLs are in the cache. Continually recording URLs into a list may have an impact on the proxy's performance. To avoid this negative effect on performance, you can disable URL recording on the Cache Specifics form and view or manage URLs in the cache by using the command-line program: extras/proxy/urldbgen. This program generates the URL list on command and does not effect the proxy's performance. See "Repairing the cache URL list" for more information about urldbgen.
Setting the cache size
Cache size is the maximum size the cache is allowed to grow. The maximum cache size is 64 GB. The amount of disk space available for the proxy cache has a considerable effect on cache performance. If the cache is too small, the cache manager program must remove cached documents to make room on the disk more often, and documents must be retrieved from content servers more often; therefore slowing performance.
Large cache sizes are best because the more cached documents, the less the network traffic load and the faster the response time the proxy provides. Also, the cache manager removes cached documents if users no longer need them. Barring any file system limitations, cache size can never be too large; the excess space simply remains unused.
Netscape's proxy caching is designed to work efficiently at any size up to 64 GB. The exact cache size you choose depends on the number of people using your proxy server. For a single user cache, 20 to 50 MB is usually enough. For a proxy that caches a multitude of documents, you might need to allocate an entire 2 GB to 4 GB disk partition for the cache. You can also have the cache split on multiple disk partitions. For more information on partitions, see See "Adding and modifying cache partitions".
You can set the cache size on the Cache Specifics form.
Note
You might encounter problems with caching if the file system where the cache
root resides has less disk space than the cache size you specify. Also, note that
expanding the cache size requires a hard restart (shutdown/restart) for the
changes to take effect.
Warning!
Changing the cache structure after installation requires that you reformat the structure and relocate existing files; therefore causing any alterations to be time-consuming. If you aren't sure what cache size to use, use 2 GB as the default value in the installation forms (this default can hold more than 2 GB of data and can be used with 3 to 5 GB caches).
Editing the cache capacity
You can edit the cache capacity through the Cache Specifics form as well as on the Cache Administration Operations form.
For more information on editing the cache capacity, see "Setting the cache capacity".
Caching HTTP documents
Internally, caching HTTP documents differs from caching FTP and Gopher documents. HTTP documents offer caching functionality that documents of the other protocols do not. However, by setting up and configuring the cache properly, you can ensure that your proxy server will cache HTTP, FTP, and Gopher documents effectively.
All HTTP documents have a descriptive header section that the proxy server uses to compare and evaluate the document in the proxy cache and the document on the remote server. When the proxy does an up-to-date check on an HTTP document, it sends one request to the server that tells the server to return the document if the version in the cache is out of date. Often, the document hasn't changed since the last request and therefore is not transferred. This method of checking to see if an HTTP document is up-to-date saves bandwidth and decreases latency.
To reduce transactions with remote servers, the proxy server allows you to set a Cache Expiration setting for HTTP documents. The Cache Expiration setting tells the proxy to estimate if the HTTP document needs an up-to-date check before sending the request to the server. The proxy makes this estimate based on the HTTP document's Last-Modified date found in the header.
With HTTP documents, you can also use a Cache Refresh setting. This option specifies whether the proxy always does an up-to-date check (which would override an Expiration setting) or if the proxy waits a specific period of time before doing a check. Figure 7.2 shows what the proxy does if both an Expiration setting and a Refresh setting are specified. Using the Refresh setting decreases latency and saves bandwidth considerably.
Using the Cache Expiration and Cache Refresh settings with HTTP
Setting the HTTP cache refresh interval
If you decide that you want your proxy server to cache HTTP documents, you need to determine whether it should always do an up-to-date check for documents in the cache or if it should check based on a Cache Refresh setting (up-to-date check interval). For HTTP documents, a reasonable refresh interval would be 4 to 8 hours, for example. The longer the refresh interval, the fewer the number of times the proxy connects with remote servers. Even though the proxy doesn't do up-to-date checking during the refresh interval, users can force a refresh by clicking the Reload button in the client (such as the Netscape Navigator); this makes the proxy force an up-to-date check with the remote server.
You can set the refresh interval for HTTP documents on either the Cache Specifics form or the Cache Configuration form.
For more information on using the Cache Specifics form, see "Setting cache specifics", and for more information on using the Cache Configuration form, see See "Configuring the cache".
Setting the HTTP cache expiration policy
You can also set up your server to check if the cached document is up-to-date by using a last-modified factor or explicit expiration information only.
Explicit expiration information is a header found in some HTTP documents that specifies the date and time when that file will become outdated. Not many HTTP documents use explicit Expires headers, so it's better to estimate based on the Last-modified header.
If you decide to have your HTTP documents cached based upon the Last-modified header, you need to select a factor to use in the expiration estimation. The factor is multiplied by the time between the last modification and the time that the document last had an up-to-date check. Smaller values make the proxy check documents more often. For example, suppose you have a document that was last changed ten days ago. If you set the last-modified factor to 0.1, the proxy interprets the factor to mean that the document is probably going to remain unchanged for one day (10 x 0.1 = 1). The proxy would, in that case, return the document from the cache if the document was checked less than a day ago.
On the other hand, in this same example, if the cache refresh setting for HTTP documents is set to less than one day, the proxy does the up-to-date check after that time has elapsed. The proxy always uses the value (cache refresh or cache expiration) that requires that it update the files most frequently.
You can set the expiration setting for HTTP documents on both the Cache Specifics form and the Cache Configuration form.
For more information on using the Cache Specifics form, see "Setting cache specifics", and for more information on using the Cache Configuration form, see See "Configuring the cache".
Reporting HTTP accesses to the remote server
When a document is cached by the Netscape Proxy Server, it can be accessed many times before it is refreshed again.
For the remote server, sending one copy to the proxy that will cache it still only represents one access, or "hit."
The Netscape Proxy Server can count how many times a given document was accessed from the proxy cache between up-to-date
checks, and then send that hit count back to the remote server in an additional HTTP request header (Cache-Info)
the next time the document is refreshed. This way, if the remote server is configured to recognize this type of header,
it receives a more accurate account of how many times a document was accessed.
You can enable HTTP access reporting on the Cache Specifics form. For more information on using the Cache Specifics form,
see "Setting cache specifics".
Caching FTP and Gopher documents
FTP and Gopher protocols do not include a method for checking to see if a document is up-to-date. Therefore, the only way
to optimize caching for FTP and Gopher protocols is to set a Cache Refresh interval. The Cache Refresh interval is the
amount of time the proxy server will wait before retrieving the latest version of the document from the remote server.
If you do not set a Cache Refresh time, the proxy will retrieve these documents even if the versions in the cache are
up-to-date.
Setting FTP and Gopher cache refresh intervals
If you are setting a cache refresh interval for FTP and Gopher protocols, choose one that you consider safe for the documents the proxy gets. For example, if you store information that rarely changes, use a high number (several days). If the data changes constantly, you'll want the files to be retrieved at least every few hours. During the refresh time, you risk sending an out-of-date file to the client. If the interval is short enough (a few hours), you eliminate most of this risk while getting noticeably faster response time.
You can set the cache refresh interval for FTP and Gopher documents on either the Cache Specifics form or the Cache
Configuration form. For more information on using the Cache Specifics form, see ;"Setting
cache specifics", and for more information on using the Cache Configuration form, see
"Configuring the cache".
Note
If your FTP and Gopher documents vary widely (some change often, others rarely), use the Cache Configuration form to create a separate template for each kind of document (for example, create a template with resources ftp://.*.gif) and then set a refresh interval that is appropriate for that resource.
Configuring the cache
You can configure the kind of caching you want for specific resources, using the Caching Configuration form. You can specify several configuration parameter values for URLs matching the regular expression pattern that you specify. This feature gives you fine-grain control of the proxy cache, based on the type of document cached. Configuring the cache can include identifying the following items:
Note
If you do not enable the caching of HTTPS documents, the proxy will assume the default, which is to not cache them.
You can set the policy for caching pages retrieved using HTTPS on the Cache Configuration form.
Caching pages that require authentication
You can choose to have your server cache files that require user authentication. If you choose to have your proxy server cache these files, the server will tag the files in the cache so that if a user asks for them, the server knows that the files require authentication from the remote server.
Because the proxy server does not know how remote servers authenticate and it does not know users' ids or passwords, it will simply force an up-to-date check with the remote server each time a request is made for a document that requires authentication. The user will therefore have to enter his or her id and password to gain access to the file. If the user has already accessed that server earlier in the Navigator session, the Navigator will automatically send the authentication information without prompting the user for it.
If you do not enable the caching of pages that require authentication, the proxy will assume the default, which is to not cache them.
You can set the policy for caching pages that require authentication on the Cache Configuration form.
Caching queries
Cached queries only
work with HTTP
documents.
You can limit the length of queries that are cached, or you can completely inhibit caching of queries. The longer the query, the less likely it is to be repeated, and the less useful it is to cache.
These caching restrictions apply: the access method has to be GET, the document must not be protected (unless caching of authenticated pages is enabled), and the response must have at least a Last-modified header. This requires the query engine to indicate that the query result document can be cached. If the Last-modified header is present, the query engine should support conditional GET method (with an If-modified-since header) in order to make caching effective; otherwise it should return an Expires header.
If you do not enable the caching of queries, the proxy will assume the default, which is to not cache them.
You can set the query cache policy on the Cache Configuration form.
Setting the minimum and maximum cache file sizes
You can set the minimum and maximum sizes for files that will be cached by your proxy server. You may want to set a minimum size if you have a fast network connection. If your connection is fast, small files may be retrieved so quickly that it is not necessary for the server to cache these files. In this instance, you would want to cache only larger files. You may want to set a maximum file size to make sure that large files do not occupy too much of your proxy's disk space.
You can set the minimum and maximum cache file sizes on the Cache Configuration form.
Setting the cache behavior for client interruptions
If a document is only partly retrieved and the client interrupts the data transfer, the proxy has the ability to finish retrieving the document for the purpose of caching it. The proxy's default is to finish retrieving a document if 25% of it has already been retrieved. Otherwise, the proxy will terminate the remote server connection and remove the partial file. You can raise or lower the client interruption percentage on the Cache Configuration form.
Adding and modifying cache partitions
Cache partitions are reserved parts of disks or memory that are set aside for caching purposes. The largest cache capacity is 64 GB with 256 cache sections. If your caching capacity changes, you may want to change or add partitions using the Cache Partition Configuration form. From this form, you can edit a partition's location, mnemonic name, and maximum and minimum sizes. You can also view the cache section table for that partition.
To add cache partitions:
The cache root directory hierarchy
To enable or disable the cache monitor and manager:
Note
NoteYou can create, edit and delete batch update configurations without having batch updates turned on. However, if you want your batch updates to be updated according to the times you set on the Cache Batch Updates form, you must turn updates on.
cbuild -d <conf-dir> -s <user>
where <conf-dir> is the directory where the proxy server instance is installed and <user> is the user account that the created files and directories should be owned by if running cbuild as root. This user id should be the same user id that the proxy is running as. For example, the directory could be /usr/ns-home/proxy-id. The utility determines the cache directory and location of the cache database based on the directory you enter.
cbuild -c <cache-dir> -u <urldb-dir> -s <user>
where <cache-dir> is the directory for your cache structure, <urldb-dir> is the directory where the cache management information is kept, and <user> is the user account that the created files and directories should be owned by if running cbuild as root. This user id should be the same user id that the proxy is running as.
cbuild is located in the extras/proxy directory.
cupgrade -d <conf-dir> -o <1.1-cache-root> -s <user>
The <conf-dir> directory is where the proxy server is installed. For example, the directory could be /usr/ns-home/proxy-id. The utility determines the new cache directory and location of the cache database based on the configuration files found in the directory you enter. The <1.1-cache-root> is the directory of the version 1.1 cache structure. The <user> is the Unix user id that the files in the cache should be owned as. It is optional and should be included only if you run the cupgrade utility as "root" and your proxy as another user. For example, you could run cupgrade as "root" and your proxy as "nobody". In this case you would replace <user> with "nobody".
Note
nobody will not work on some systems, such as HP-UX.
When using these systems, you must specify a user other than nobody for both
the proxy and for cupgrade.
The cache upgrade can take anywhere from a few minutes to several hours depending on the size of the old cache structure.
cupgrade <sect> <sect> ... <sect>
The 2.0 upgrade should be run in the cache directory where all of the cache sections reside. Each <sect> is a section in the cache that you want to upgrade. The number of calls depends upon how many sections are in the cache. For example, if your cache directory is: /usr/ns-home/cache and you have a 1GB cache, you would then have 8 sections in your cache directory. You should type the following at the command line:
cd /usr/ns-home/cache
cupgrade s3.0 s3.1 s3.2 s3.3 s3.4 s3.5 s3.6 s3.7
Instead of typing each section, you could simply use s* to pass all of the section directory names. In this instance, you would type the following:
cd /usr/ns-home/cache
cupgrade s*
If you have multiple cache partitions you would need to run an upgrade utility for each partition. For example, your cache directory may be: /usr/ns-home/cache and you have a 2GB cache, 16 sections, and 2 partitions (with 8 sections on each partition). The partitions are /disk1/cache-1 and /disk2/cache-2. The syntax for the cupgrade utility would then be:
cd /usr/ns-home/cache/disk1/cache-1
cupgrade s4.00 s4.01 s4.02 s4.03 s4.04 s4.05 s4.06 s4.07
cd /usr/ns-home/cache/disk2/cache-2
cupgrade s4.08 s4.09 s4.10 s4.11 s4.12 s4.13 s4.14 s4.15
You could also upgrade all sections on both partitions by typing the following at the command line:
cupgrade /disk1/cache-1/s* /disk2/cache-2/s*
The cache upgrade can take anywhere from a few minutes to several hours depending on the size of the old cache structure.
urldbgen -d <conf-dir> -s <user>
where <conf-dir> is the directory where the proxy server is installed and <user> is the user account that the created files and directories should be owned by if running cbuild as root. This user id should be the same user id that the proxy is running as. For example, the directory could be /usr/ns-home/proxy-id. The utility determines the cache directory and location of the cache database based on the directory you enter.
urldbgen -c <cache-dir> -u <urldb-dir> -s <user>
where <cache-dir> is the directory for your cache structure, <urldb-dir> is the directory where the cache URLs are recorded, and <user> is the user account that the created files and directories should be owned by if running cbuild as root. This user id should be the same user id that the proxy is running as.
Note
urldbgc -d <conf-dir> -s <user>
where <conf-dir> is the directory where the proxy
server is installed and <user> is the user account that the created files and directories should be owned by
if running cbuild as root. This user id should be the same user id that the proxy is running as. For example, the
directory could be /usr/ns-home/proxy-id. The utility determines the cache directory and location of the cache
database based on the directory you enter.
urldbgc -c <cache-dir> -u <urldb-dir> -s <user>
where <cache-dir> is the
directory for your cache structure, <urldb-dir> is the directory where the cache URL database is kept, and
<user> is the user account that the created files and directories should be owned by if running cbuild as root.
This user id should be the same user id that the proxy is running as.
Note
If you do not wish to garbage collect, but you want to fully delete all of the files in your cache, type the following at the command-line:
cd/cache find s* -typef -exec rm {} \.; where <proxy directory> is the directory where your proxy is kept.