Pantek Library
Hosting Provided By
CybrHost
High Speed Hosting

NewCache - a requirements spec v0.01

From: Graham Leggett <minfrin(at)sharp.fm>
Date: Thu Feb 22 2001 - 21:23:56 EST


Hi all,

This is a preliminary discussion about a proposed caching module in Apache v2.0. It's a sort of a requirements specification, if you will.

The design is based entirely on proxy caching described in RFC2616, and is rather tricky - as a result I've tried to describe things very simplistically at the beginning, and then layering each new piece of complexity so that the big picture is not overwhelming.

===

mod_cache


Requirements


The purpose of any cache is to make the transfer of information through or from a system more efficient. A cache is a tradeoff between a number of attributes, in our case the tradeoffs are:

  • Bandwidth conservation - We want to transfer as few bytes over the network as possible.
  • CPU cycle conservation - We want our webservers to do as little crunching as is possible. Less crunching means less computing horsepower, and thus a smaller and faster server.
  • Memory - We cache data in memory - memory is traded off for performance above.
  • Disk - We cache data to disk - disk space is traded off for performance.
  • Caching everything - We cache all data, from static data on disk, to dynamically generated data, to data pulled from another server through a reverse proxy.
  • We use the control techniques described in RFC2616 as a "public cache".
  • THE DESIGN MUST BE EASY TO FOLLOW AND UNDERSTAND.
Caching - The Simple View

There are two tasks a cache module must perform at a basic level:

  • Place new cached data into the cache
  • Serve cached data from the cache
Do you need help?X

These two functions are handled by two separate halves of the cache: A content generator "Cache Out", and a filter "Cache In":

                +-------------------------+
                |         Browser         |
                +-------------------------+
                    |              ^  ^
                    |              |  |
                    v              |  |
             +-----------+   Y     |  |
             | Cache Out |---------+  |
             +-----------+            |
                    |                 |
                    | N          +----------+
                    |            | Cache In |
                    |            +----------+
                    v                 ^
             +-------------+          |
             |    Apache   |----------+
             +-------------+ 

Very simplistically described, a request from a webbrowser is first intercepted by the "Cache Out" content generator. If the request is cached, the cached data is returned and the request ends immediately. If not, the content generator does nothing and the rest of Apache is responsible for generating the content.

At the other end, the "Cache In" filter is responsible for putting content generated by Apache into the cache. This module directs data either to memory or to disk (or a combination of both) depending on the configuration of the cache.

Caching - The Slightly More Complicated View


Of course, caching isn't actually this easy. Some complications set in when we note that data is not only either "inside" or "not inside" the cache, but also of varying freshness as well.

RFC2616 describes mechanisms for specifying how long an item in the cache can remain fresh. When a cached entity expires and is no longer fresh, we do not simply discard the cached data - instead the "Cache Out" content generator modifies the browser request slightly to change the request to a conditional request and hand the browser request down to the rest of Apache.

The "Cache In" filter looks at the result of this conditional request. If the result is "304 Not Modified", then the "Cache In" filter fulfils the request from the cache just as the "Cache Out" content generator would have at the start.

If the result is not "304 Not Modified" it means there will be new data on the way. The "Cache In" filter places the data in the cache as normal replacing whatever was there before, and the data is passed to the browser as normal.

              +----------------------------------------+
              |         Browser                        |
              +----------------------------------------+
                  |                ^             ^  ^
                  |                |             |  |
                  v                | Y           |  |
           +-----------+  Y      +-----------+   |  |
           | Cache Out |-------->| Cache Out |   |  +-----+
           | in cache? |         | fresh?    |   |        |
           +-----------+         +-----------+   |    +----------+
                  | N                      | N   |    | Cache In |
                  | +-------------------+  |     |    | serve    |
                  +-| Cache Out         |<-+     |    | from     |
                  | | force conditional |        |    | cache    |
                  | +-------------------+        |    +----------+ 
                  |                              |        |
                  v                            N |      Y |
           +-------------+              +---------------------+
Can we help you?X
| Apache |--------------| Cache In | +-------------+ | force conditional & | | 304 Not Modified? | +---------------------+
Do you need more help?X

In addition to the above RFC2616 also defines ways to determine whether an object is cachable or not. Depending on the value of the Cache-Control (and possibly other) headers, the "Cache In" and "Cache Out" modules decide whether an object is cacheable at all. If not, these modules take action to tell the "Storage Manager" (coming soon) to delete the objects from the cache if necessary.   

Caching - The Plot Thickens


Yes, it gets even more complicated, but not really.

HTTP/1.1 (RFC2616) supports content negotiation. In a nutshell this means that a single URL can have a number of representations: The language might be different, or the data might have a special content encoding, or it might be compressed. This means that different browsers can get different data in response to the same request for the same URL. The cache needs to handle this in an intelligent fashion.

To do this, we break down the cache code again and introduce a new bit:

  • "Cache Out" - The content generator
  • "Cache In" - the filter
  • "Storage Manager" - the bit that handles the actual storing of the data, either on disk or in RAM.

To keep the cache code simple we say that the "Cache Out" and "Cache In" modules have no knowledge whatsoever of content negotiation. All they do is give the URL and the request headers to the "Storage Manager", and using the combination of URL and request headers the "Storage Manager" makes the decision as to whether an object is cached or not, or whether an object should be replaced.

So, we could see four (or more) different objects in the cache for the same URL, each with their own independantly defined freshness, and each treated entirely separately from the other:

                                                 +------------+
                                           +-----| Normal     |
                          +---------+      |     +------------+
                  +------>| English |------+
                  |       +---------+      |     +------------+
                  |                        +-----| Compressed |
   +-------+      |                              +------------+
   |  URL  |------+
   +-------+      |                              +------------+
                  |                        +-----| Normal     |
                  |       +---------+      |     +------------+
                  +------>| French  |------+
                          +---------+      |     +------------+
                                           +-----| Compressed |
                                                 +------------+

The "Storage Manager" is a modular design - add on modules allow you to cache to shared memory, or disk, or to other cache storage mechanisms still to be invented.

Can't find what you're looking for?X

Caching - The Complicated Bit


Just when you thought that was it!

It has been pointed out that storing both compressed and uncompressed versions of the same object representation in the cache is a waste of resources. Although the cache tries very hard to remain transparent to the content that is being cached, there are some optimisations that can be made to speed up the process. The best place for this to happen is in an "Optimisation Layer" sandwiched between the "Cache In" and "Cache Out" modules, and the "Storage Manager".

   +-----------+
   | Cache Out |-----+

   +-----------+     |    +--------------------+    +-----------------+
                     +--->| Optimisation Layer |--->| Storage Manager |
   +-----------+     |    +--------------------+    +-----------------+
   | Cache In |-----+
   +-----------+

The optimisation layer is designed to perform some optimations on the data going into and out of the cache.

Some optimisations include:

  • Compression:

If uncompressed data is being put into the "Storage Manager", the "Optimisation Layer" compresses the data before putting it in the cache.

If uncompressed data is requested from the "Storage Manager", the "Optimisation Layer" will uncompress the data on the fly before passing it on back to either the "Cache In" or "Cache Out" modules.

Don't know where to look next?X

In both of these cases, neither the "Cache In", "Cache Out" nor "Storage Manager" modules need worry about these optimisations.

These optimisations also need not depend at all on other modules in Apache, such as mod-gzip.


Regards,
Graham

-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."
Received on Fri Feb 23 01:17:11 2001

This archive was generated by hypermail 2.1.8 : Thu Aug 24 2006 - 14:53:14 EDT


Contact Us  Legal Notices  Order Services Online 
Pantek Home  Privacy Policy  IT news  Site Map  Pantek Library