Squid 2.4 Stable1
Configuration Manual
 

   
Previous
Table of Contents
Next

GLOSSARY


parent

In a parent relationship, the child cache will forward requests to its parent cache. If the parent does not hold a requested object, it will forward the request on behalf of the child. A cache hierarchy should closely follow the underlying network topology. Parent caches should be located alongthe network paths towards the greater Internet. For example, if your Internet Service Provider (ISP) operates a cache, it should probably be a parent to yours, since your Web traffic will have to travel along your ISPs infrastructure anyway


sibling

In a sibling relationship, a peer may only request objects already held in the cache; a sibling can not forward cache misses on behalf of the peer. The sibling relationship should be used for caches ``nearby'' but not in the direction of your route to the Internet. For example, it may make sense for a number of department-specific caches within an organization to have sibling relationships among them. This approach is even more compelling when there is no parent cache available for the organization as a whole


Multicast and Unicast

A unicast packet is the complete opposite: one machine is talking to only one other machine. All TCP connections are unicast, since they can only have one destination host for each source host. UDP packets are almostalways unicast too, though they can be sent to the broadcast address sothat they reach every single machine in some cases.

A multicast packet is from one machine to one or more. The difference between a multicast packet and a broadcast packet is that hosts receiving multicast packets can be on different lans, and that each multicast data-stream is only transmitted between networks once, not once per machine on the remote network. Rather than each machine connecting to a video server, the multicast data is streamed per-network, and multiple machines just listen-in on the multicast data once it's on the network


Netmask

An IP address has two components, the network address and the host address. For example, consider the IP address 172.16.1.25. Assuming this is part of a Class B network, the first two numbers (172.16) represent the Class B network address, and the second two numbers (1.25) identify a particular host on this network.

Subnetting enables the network administrator to further divide the host part of the address into two or more subnets. In this case, a part of the host address is reserved to identify the particular subnet. This is easier to see if we show the IP address in binary format. The full address is:

11111000.00001000.00000001.00011001 The Class B network part is:
11111000.00001000
and the host address is
00000001.00011001
If the subnetmask for this IP Address is 255.255.255.0, 11111111.11111111.11111111.00000000 (binary).
The resultant SubnetAddress is got by bitwis AND operations.
If this network is divided into 255 subnets, then the first 8 bits of the host address (00000000) are reserved for identifying the subnet.
11111000.00001000.00000001.00000000
Hence, resultant is 172.16.1.0. It refers IPAddress from 172.16.1.1 to172.16.1.255


FileSystems in Squid

The cache_dir type in Squid has nothing to do with the underlying filesystem type, it defines the storage method / implementation.

Currently Squid has 4 different implementations:
ufs :-  On top of a normal filesystem supporting directoriesand files.
aufs :-   As "ufs", but using threads to implement non-blocking disk I/O
diskd :-  As "ufs", but using a separate process to implement non-blocking disk I/O
coss :-  An experimental "raw" filesystem, where all objectsare stored in one big file.
Other storage methods are being worked upon

Kind of. diskd is designed to work around the problem of blocking IO in a unix process. asyncufs gets around this by using threads to complete disk IO. diskd uses external processes to complete disk IO.

Asyncufs works just that little bit faster, but only works on systems where threads can do async disk IO without blocking the main process. Systems with user-threads (eg FreeBSD) can not use this effectively. Diskd, being implemented as an external process, gets around this. If cache is slightly active, then the difference cannot be noticed. diskd/aufs are only useful when the cache is under
high load.

In case it was not clear, asyncronous I/O (diskd/aufs) is beneficial forsingle drive configurations with "higher" request loads, in many cases allowingyou to push about 100% more I/O thru the drive before latency creeps up toohigh.

For multiple drive configurations it is almost a requirement to be able to use the I/O capacity of the extra drives. Without it a multiple disk configuration is effectively limited to almost the speed of a single disk configuration. With asyncronous I/O the disk I/O scales quite well (at least for the first few drives, other limits gets very apparent when you have more than ~3 drives)


Cache_peer Options

Proxy-only

Data retrieved from this remote cache will not be stored locally, but retrieved again on any subsequent request. By default Squid will store objects it retrieves from other caches: by having the object available locally itcan return the object fast if it's ever requested again. While this isgoodfor latency, it can be a waste of bandwidth, especially if the othercacheis on the same piece of ethernet. In the examples section of thischapter,we use this option when load-balancing between two cache servers.

Weight

If more than one cache server has an object (based on the result of anICP query), Squid decides which cache to get the data from the cache thatresponded fastest. If you want to prefer one cache over another, you canadd a weight value to the preferred cache's config line. Larger valuesare preferred. Squid times how long each ICP request takes (in milliseconds),and divides the time by the weight value, using the cache with the smallestresult.Your weight value should thus not be an unreasonable value.

ttl

An outgoing multicast packet has a ttl (Time To Live) value, which is used to ensure that loops are not created. Each time a packet passes through a router, the router decrements this ttl value, and the value is then checked. Once the value reaches zero, the packet is dropped. If you want multicast packets to stay on your local network, you would set the ttl value to 1. The first router to see the packet would decrement the packet, discover the ttl was zero and discard it. This value gives you a level of control on how many multicast routers will see the packet. You should set thisvalue carefully, so that you limit packets to your local network or immediate multicast peers (larger multicast groups are seldom of any use: they generate too many responses, and when geographically dispersed, may simply add latency. You also don't want crackers picking up all your ICP requests by joining the appropriate multicast group.)

No-query

Squid will send ICP requests to all configured caches. The response time is measured, and used to decide which parent to send the HTTP request to. There is another function of these requests: if there is no response to a request, the cache is marked down. If you are communicating with a cache that does not support ICP, you must use the no-query option: if you don't, Squid will consider that cache down, and attempt to go directly to thedestination server. (If you want, you can set the ICP port on the configline to point to the echo port, port 7. Squid will then use this port tocheck if themachine is available. Note that you will have to configureinetd.conf tosupportthe UDP echo port.) This option is normally usedin conjunctionwith thedefault option.

Default

This sets the host to be the proxy of last resort. If no other cache matches a rule (due to acl or domain filtering), this cache is used. If you have only one way of reaching the outside world, and it doesn't support ICP, you can use the default and no-query options to ensure that all queries are passed through it. If this cache is then down, the client will seean error message (without these options, Squid would attempt to route around the problem.)

round-robin

This option must be used on more than one cache_peer line to be useful. Connections to caches configured with this options are spread evenly (round-robined) among the caches. This can be used by client caches to communicate with a group of loaded parents, so that load is spread evenly. If you have multiple Internet connections, with a parent cache on each side, you can use this option to do some basic load-balancing of the connections.

no-netdb-exchange

If your cache was configured to keep ICMP (ping) timing information with the --enable-icmp configure option, your cache will attempt to retrieve the remote machine's ICMP timing information from any peers. If you don't want this to happen (or the remote cache doesn't support it), you can use the no-netdb-exchange option to stop Squid from requesting this information from the cache

no-delay

Hits from other caches will normally be included into a client's delay-pool information. If you have two caches load-balancing, you don't want thehits from the other cache to be limited. You may also want hits from cachesin a nearby hierarchy to come down at full speed, not to be limited asif they were misses. Use the no-delay option to ensure that requests comedown at their full speed

login

Caches can be configured to use usernames and passwords on accesses. To authenticate with a parent cache, you can enter a username and password using this tag. Note that the HTTP protocol makes authenticating to multiple cache servers impossible: you cannot chain together a string of proxies, each one requiring authentication. You should only use this option if this is a personal proxy


Probe

Squid will wait for up to dead_peer_timeout seconds after sending out an ICP request before deciding to ignore a peer. With a multicast group, peers can leave andjoin at will, and it should make no difference to a client. This presentsa problem for Squid: it can't wait for a number of seconds each time (whatif thecaches are on the same network, and responses come back in milliseconds:the waiting just adds latency.) Squid gets around this problem by sendingICP probes to the multicast address occasionally. Each host in the groupresponds to the probe, and Squid will know how many machines are currentlyin the group. When sending a real request, Squid will wait until it getsat least as many responses as were returned in the last probe: if more arrive,great. If less arrive, though, Squid will wait until the dead_peer_timeout value is reached. If there is still no reply, Squid marks that peer as down, so that all connections are not held upby one peer


What is the httpd-accelerator mode?

An accelerator caches incoming requests for outgoing data (i.e., that which you publish to the world). It takes load away from your HTTP server and internal network. You move the server away from port 80 (or whatever your published port is), and substitute the accelerator, which then pulls the HTTP datafrom the ``real" HTTP server (only the accelerator needs to know where thereal server is). The outside world sees no difference (apart from an increasein speed, with luck).

The httpd_accel_uses_host_header Option

The httpd_accel_uses_host_header option A normal HTTP request consists of three values: the type of transfer (normally a GET, which is used for downloads); the path and filename to be retrieved (or executed, in the case of a cgiprogram); and the HTTP version.

This layout is fine if you only have one web site on a machine. On systems where you have more than one site, though, it makes life difficult: the request does not contain enough information, since it doesn't include information about the destination domain. Most operating systems allow you to have IP aliases, where you have more than one IP address per network card. By allocating one IP per hosted site, you could run one web server per IP address. Once the programs were made more efficient, one running program could act as a server for many sites: the only requirement was that you had one IP address per domain. Server programs would find out which of the IP addresses clients were connected to, and would serve data from different directories for each IP.

There are a limited number of IP addresses, and they are fast running out. Some systems also have a limited number of IP aliases, which means that you cannot host more than a (fairly arbitrary) number of web sites on machine. If the client were to pass the destination host name along with the pathand filename, the web server could listen to only one IP address, and wouldfind the right destination directores by looking in a simple hostname table.

>From version 1.1 on, the HTTP standard supports a special Host header, whichis passed along with every outgoing request. This header also makes transparentcaching and acceleration easier: by pulling the host value out of the headers,Squid can translate a standard HTTP request to a cache-specific HTTP request,which can then be handled by the standard Squid code. Turning on the httpd_accel_uses_host_headeroption enables this translation. You will need to use this option when doingtransparent caching.

It's important to note that acls are checked before this translation. You must combine this option with strict source-address checks, so you cannot use this option to accelerate multiple backend servers (this is certain to change in a later version of Squid).

Access.log details

The native access.log has ten (10)fields.There is one entry here for each HTTP (client) request and each ICPQuery.HTTP requests are logged when the client socket is closed. A singledash (‘-‘)indicates unavailable data.

1. Timestamp
The time when the client socket is closed. The format is “Unix time” (seconds since Jan 1, 1970) with millisecond resolution. This can be modified to visible format by “cat access.log | perl -nwe 's/^(\d+)/localtime($1)/e; print'”.

2. Elapsed Time
The elapsed time of the request, in milliseconds. This is time between the accept() and close() of the client socket.

3. Client Address
The IP address of the connecting client, or the FQDN if the ‘log_fqdn’ option is enabled in the config file.

4. Log Tag / HTTP Code
The Log Tag describes how the request was treated locally (hit, miss, etc). All the tags are described below. The HTTP code is the reply code taken from the first line of the HTTP reply header. Non-HTTP requests may have zeroreply codes.

5. Size
The number of bytes written to the client.

6. Request Method
The HTTP request method, or ICP_QUERY for ICP requests.

7. URL
The requested URL.

8. Ident
If ‘ident_lookup’ is on, this field may contain the usernameassociated with the client connection as derived from the ident service.

9. Hierarchy Data / Hostname
A description of how and where the requested object was fetched.

10. Content Type
The Content-type field from the HTTP reply.

Access Log Tag / HTTP Code

“TCP_” refers to requests on the HTTP port.

TCP_HIT
A valid copy of the requested object was in the cache.

TCP_MISS
The requested object was not in the cache.

TCP_REFRESH_HIT
The object was in the cache, but STALE. An If-Modified-Since request was made and a “304 Not Modified” reply was received.

TCP_REF_FAIL_HIT
The object was in the cache, but STALE. The request to validate the object failed, so the old (stale) object was returned.

TCP_REFRESH_MISS
The object was in the cache, but STALE. An If-Modified-Since request was made and the reply contained new content.

TCP_CLIENT_REFRESH
The client issued a request with the “no-cache” pragma.

TCP_CLIENT_REFRESH_MISS
The client issued a "no-cache" pragma, or some analogous cache control command along with the request. Thus, the cache has to refetch the object from origin server.  It is users pushing that reload-button forcingthe proxy to check for a new copy (also triggered by selecting a bookmarkin some browser versions).
In short, the browser forced the proxy to check for a new version

TCP_IMS_HIT
The client issued an If-Modified-Since request and the object was in thecache and still fresh. TCP_HIT and TCP_IMS_HIT  are hits, the only difference is that in the TCP_IMS_HIT case the browser already had an up to date version so there was no need to send the Squidcached copy to the requestor

TCP_IMS_MISS
The client issued an If-Modified-Since request for a stale object.

TCP_SWAPFAIL
The object was believed to be in the cache, but could not be accessed.

TCP_DENIED
Access was denied for this request

“UDP_” refers to requests on the ICP port

UDP_HIT
A valid copy of the requested object was in the cache.

UDP_HIT_OBJ
Same as UDP_HIT, but the object data was small enough to be sent in the UDP reply packet. Saves the following TCP request.

UDP_MISS
The requested object was not in the cache.

UDP_DENIED
Access was denied for this request.

UDP_INVALID
An invalid request was received.

UDP_RELOADING
The ICP request was "refused" because the cache is busy reloading its metadata.


Refresh Pattern

Squid switched from a Time-To-Live based expiration model to a Refresh-Rate model. Objects are no longer purged from the cache when they expire. Instead of assigning TTL’s when the object enters the cache, we now check freshness requirements when objects are requested. If an object is “fresh” it is given directly to the client. If it is “stale” then we make an If-Modified-Since request for it. When checking the object freshness, we calculate these values:

AGE is how much the object has aged since it was retrieved:

AGE = NOW - OBJECT_DATE

LM_AGE
is how old the object was when it was retrieved:

LM_AGE = OBJECT_DATE - LAST_MODIFIED_TIME

LM_FACTOR is the ratio of AGE to LM_AGE:

LM_FACTOR = AGE / LM_AGE

CLIENT_MAX_AGE

is the (optional) maximum object age the client will accept as taken from the HTTP/1.1 Cache-Control request header. EXPIRES is the (optional) expiry time from the server reply headers. These values are compared with theparameters of the ‘refresh_pattern’ rules. The refresh parametersare:

URL regular expression

MIN_AGE

PERCENT

MAX_AGE

The URL regular expressions are checked in the order listed until a match is found. Then this algorithm is applied for determining if an object is fresh or stale:

if (CLIENT_MAX_AGE)
if (AGE > CLIENT_MAX_AGE)
return STALE
if (AGE <= MIN_AGE)
return FRESH
if (EXPIRES) {
if (EXPIRES <= NOW)
return STALE
else
return FRESH
}
if (AGE > MAX_AGE)
return STALE
if (LM_FACTOR < PERCENT)
return FRESH
return STALE

Note that the Max-Age in a client request takes the highest precedence. The ‘MIN’ value should normally be set to zero since it has higher precedence than the server’s Expires: value. But if you wish to override the Expires: headers, you may use the MIN value.


Terms in delay pool

Pool : A collection of bucket groups as appropriate to a givenclass

bucket group a group of buckets within a pool, such as the per-host bucket group, the per-network bucket group or the aggregate bucket group (the aggregate bucket group is actually a single bucket)

bucket an individual delay bucket represents a traffic allocation, which is replenished at a given rate (up to a given limit) and causes traffic to be delayed when empty

class the class of a delay pool determines how the delay is applied, ie, whether the different client IPs are treated seperately or as a group (or both)

class 1 a class 1 delay pool contains a single unified bucket,which is used for all requests from hosts subject to the pool

class 2 a class 2 delay pool contains one unified bucket and 255 buckets, one for each host on an 8-bit network (IPv4 class C)

class 3 contains 255 buckets for the subnets in a 16-bit network, and individual buckets for every host on these networks (IPv4 class B)


Ftp Login Information

Squid can act as a proxy server for various Internet protocols. The most commonly used protocol is HTTP, but the File Transfer Protocol (FTP) is still alive and well.

FTP was written for authenticated file transfer (it requires a username and password). To provide public access, a special account is created:the anonymous user. When you log into an FTP server you use this as yourusername. As a password you generally use your email address. Most browsersthesedays automatically enter a useless email address.

It’s polite to give an address that works, though. If one of your users abuses a site, it allows the site admin get hold of you easily.

Squid allows you to set the email address that is used with the ftp_user tag. You should probably create a squid@yourdomain.example email address specifically for people to contact you on.

There is another reason to enter a proper address here: some servers require a real email address. For your proxy to log into these ftp servers youwill have to enter a real email address here.


Effective User and Group ID

Squid can only bind to low numbered ports (such as port 80) if it is started as root. Squid is normally started by your system’s rc scripts when the machine boots. Since these scripts run as root, Squid is started as root at bootup time.

Once Squid has been started, however, there is no need to run it as root. Good security practice is to run programs as root only when it’sabsolutely necessary, and for this reason Squid changes user and groupID’s once it has bound to the incoming network port.

The cache_effective_user and cache_effective_group tags tell Squid what ID’s to change to. The Unix security system would be useless if it allowed all users to change their ID’s at will, so Squid only attempts to change ID’s if the main program is started as root.

If you do not have root access to the machine, and are thus not starting Squid as root, you can simply leave this option commented out. Squid will then run with whatever user ID starts the actual Squid binary.

As discussed in chapter 2, this book assumes that you have created both a squid user and a squid group on your cache machine. The above tags should thus both be set to “squid”.


Timeouts

Half closed clients: The clients that shutsdown the sending side of their TCP connections, while leaving their receiving sides open we term it as halfclosed clients ie the clients closes while the handshaking is in progress.

Fully closed clients: The clients and servers have shared their acknowledgements(request and responses) before closing.

IDENT: Squid will make an RFC931/ident request for client connections if 'ident_lookup' is enabled in the config file. Currently, the ident value is only logged with the request in the access.log. It is not currentlypossible to use the ident return value for access control purposes.

URN:

SIGHUP or SIGTERM: The system signal sent to processes running in linux OS to shutdown.


External Programs

Htpasswd: It is apache type passwd ,You can use this to create passwd for squid also. The Syntex is.
htpasswd [ -c ] passwdfile username .

Redirector: Squid now has the ability to rewrite requested URLs. Implemented as an external process (similar to a dnsserver), Squid canbe configured to pass every incoming URL through a 'redirector' processthat returns either a new URL, or a blank line to indicate no change.

The redirector program is NOT a standard part of the Squid package.However there are a couple of user-contributed redirectors in the "contrib/" directory. Since everyone has different needs, it is up to the individual administrators to write their own implementation. For testing, and a place to start, this very simple Perl script can be used:

#!/usr/local/bin/perl
$|=1;
print while (<>);

The redirector program must read URLs (one per line) on standard input, and write rewritten URLs or blank lines on standard output. Note that the redirector program can not use buffered I/O. Squid writes.