Squid 2.4 Stable1 Configuration Manual |
||
Previous |
In a parent relationship, the child cache will forward requests to its
parent cache. If the parent does not hold a requested object, it will forward
the request on behalf of the child. A cache hierarchy should closely follow
the underlying network topology. Parent caches should be located alongthe
network paths towards the greater Internet. For example, if your Internet
Service Provider (ISP) operates a cache, it should probably be a parent
to yours, since your Web traffic will have to travel along your ISPs infrastructure
anyway
In a sibling relationship, a peer may only request objects already held in the cache; a sibling can not forward cache misses on behalf of the peer. The sibling relationship should be used for caches ``nearby'' but not in the direction of your route to the Internet. For example, it may make sense for a number of department-specific caches within an organization to have sibling relationships among them. This approach is even more compelling when there is no parent cache available for the organization as a whole
A unicast packet is the complete opposite: one machine is talking to only
one other machine. All TCP connections are unicast, since they can only
have one destination host for each source host. UDP packets are almostalways
unicast too, though they can be sent to the broadcast address sothat they
reach every single machine in some cases.
A multicast packet is from one machine to one or more. The difference between
a multicast packet and a broadcast packet is that hosts receiving multicast
packets can be on different lans, and that each multicast data-stream is
only transmitted between networks once, not once per machine on the remote
network. Rather than each machine connecting to a video server, the multicast
data is streamed per-network, and multiple machines just listen-in on the
multicast data once it's on the network
An IP address has two components, the network address and the host address.
For example, consider the IP address 172.16.1.25. Assuming this is part
of a Class B network, the first two numbers (172.16) represent the Class
B network address, and the second two numbers (1.25) identify a particular
host on this network.
Subnetting enables the network administrator to further divide the host
part of the address into two or more subnets. In this case, a part of the
host address is reserved to identify the particular subnet. This is easier
to see if we show the IP address in binary format. The full address is:
11111000.00001000.00000001.00011001 The Class B network part is:
11111000.00001000
and the host address is
00000001.00011001
If the subnetmask for this IP Address is 255.255.255.0, 11111111.11111111.11111111.00000000
(binary).
The resultant SubnetAddress is got by bitwis AND operations.
If this network is divided into 255 subnets, then the first 8 bits of the
host address (00000000) are reserved for identifying the subnet.
11111000.00001000.00000001.00000000
Hence, resultant is 172.16.1.0. It refers IPAddress from 172.16.1.1 to172.16.1.255
The cache_dir type in Squid has nothing to do with the underlying filesystem
type, it defines the storage method / implementation.
Currently Squid has 4 different implementations:
ufs :- On top of a normal filesystem supporting directoriesand
files.
aufs :- As "ufs", but using threads to implement non-blocking
disk I/O
diskd :- As "ufs", but using a separate process to implement
non-blocking disk I/O
coss :- An experimental "raw" filesystem, where all objectsare
stored in one big file.
Other storage methods are being worked upon
Kind of. diskd is designed to work around the problem of blocking IO in a
unix process. asyncufs gets around this by using threads to complete disk
IO. diskd uses external processes to complete disk IO.
Asyncufs works just that little bit faster, but only works on systems where
threads can do async disk IO without blocking the main process. Systems with
user-threads (eg FreeBSD) can not use this effectively. Diskd, being implemented
as an external process, gets around this. If cache is slightly active, then
the difference cannot be noticed. diskd/aufs are only useful when the cache
is under high load.
In case it was not clear, asyncronous I/O (diskd/aufs) is beneficial forsingle
drive configurations with "higher" request loads, in many cases allowingyou
to push about 100% more I/O thru the drive before latency creeps up toohigh.
For multiple drive configurations it is almost a requirement to be able to
use the I/O capacity of the extra drives. Without it a multiple disk configuration
is effectively limited to almost the speed of a single disk configuration.
With asyncronous I/O the disk I/O scales quite well (at least for the first
few drives, other limits gets very apparent when you have more than ~3 drives)
Proxy-only
Data retrieved from this remote cache will not be stored locally, but retrieved
again on any subsequent request. By default Squid will store objects it
retrieves from other caches: by having the object available locally itcan
return the object fast if it's ever requested again. While this isgoodfor
latency, it can be a waste of bandwidth, especially if the othercacheis
on the same piece of ethernet. In the examples section of thischapter,we
use this option when load-balancing between two cache servers.
Weight
If more than one cache server has an object (based on the result of anICP
query), Squid decides which cache to get the data from the cache thatresponded
fastest. If you want to prefer one cache over another, you canadd a weight
value to the preferred cache's config line. Larger valuesare preferred.
Squid times how long each ICP request takes (in milliseconds),and divides
the time by the weight value, using the cache with the smallestresult.Your
weight value should thus not be an unreasonable value.
ttl
An outgoing multicast packet has a ttl (Time To Live) value, which is used
to ensure that loops are not created. Each time a packet passes through
a router, the router decrements this ttl value, and the value is then checked.
Once the value reaches zero, the packet is dropped. If you want multicast
packets to stay on your local network, you would set the ttl value to 1.
The first router to see the packet would decrement the packet, discover
the ttl was zero and discard it. This value gives you a level of control
on how many multicast routers will see the packet. You should set thisvalue
carefully, so that you limit packets to your local network or immediate
multicast peers (larger multicast groups are seldom of any use: they generate
too many responses, and when geographically dispersed, may simply add latency.
You also don't want crackers picking up all your ICP requests by joining
the appropriate multicast group.)
No-query
Squid will send ICP requests to all configured caches. The response time
is measured, and used to decide which parent to send the HTTP request to.
There is another function of these requests: if there is no response to
a request, the cache is marked down. If you are communicating with a cache
that does not support ICP, you must use the no-query option: if you don't,
Squid will consider that cache down, and attempt to go directly to thedestination
server. (If you want, you can set the ICP port on the configline to point
to the echo port, port 7. Squid will then use this port tocheck if themachine
is available. Note that you will have to configureinetd.conf tosupportthe
UDP echo port.) This option is normally usedin conjunctionwith thedefault
option.
Default
This sets the host to be the proxy of last resort. If no other cache matches
a rule (due to acl or domain filtering), this cache is used. If you have
only one way of reaching the outside world, and it doesn't support ICP,
you can use the default and no-query options to ensure that all queries
are passed through it. If this cache is then down, the client will seean
error message (without these options, Squid would attempt to route around
the problem.)
round-robin
This option must be used on more than one cache_peer line to be useful.
Connections to caches configured with this options are spread evenly (round-robined)
among the caches. This can be used by client caches to communicate with
a group of loaded parents, so that load is spread evenly. If you have multiple
Internet connections, with a parent cache on each side, you can use this
option to do some basic load-balancing of the connections.
no-netdb-exchange
If your cache was configured to keep ICMP (ping) timing information with
the --enable-icmp configure option, your cache will attempt to retrieve
the remote machine's ICMP timing information from any peers. If you don't
want this to happen (or the remote cache doesn't support it), you can use
the no-netdb-exchange option to stop Squid from requesting this information
from the cache
no-delay
Hits from other caches will normally be included into a client's delay-pool
information. If you have two caches load-balancing, you don't want thehits
from the other cache to be limited. You may also want hits from cachesin
a nearby hierarchy to come down at full speed, not to be limited asif they
were misses. Use the no-delay option to ensure that requests comedown at
their full speed
login
Caches can be configured to use usernames and passwords on accesses. To
authenticate with a parent cache, you can enter a username and password
using this tag. Note that the HTTP protocol makes authenticating to multiple
cache servers impossible: you cannot chain together a string of proxies,
each one requiring authentication. You should only use this option if this
is a personal proxy
Squid will wait for up to dead_peer_timeout seconds after sending out an ICP request before deciding to ignore a peer. With a multicast group, peers can leave andjoin at will, and it should make no difference to a client. This presentsa problem for Squid: it can't wait for a number of seconds each time (whatif thecaches are on the same network, and responses come back in milliseconds:the waiting just adds latency.) Squid gets around this problem by sendingICP probes to the multicast address occasionally. Each host in the groupresponds to the probe, and Squid will know how many machines are currentlyin the group. When sending a real request, Squid will wait until it getsat least as many responses as were returned in the last probe: if more arrive,great. If less arrive, though, Squid will wait until the dead_peer_timeout value is reached. If there is still no reply, Squid marks that peer as down, so that all connections are not held upby one peer
“TCP_” refers to requests on the HTTP port.
TCP_HIT
A valid copy of the requested object was in the cache.
TCP_MISS
The requested object was not in the cache.
TCP_REFRESH_HIT
The object was in the cache, but STALE. An If-Modified-Since request was
made and a “304 Not Modified” reply was received.
TCP_REF_FAIL_HIT
The object was in the cache, but STALE. The request to validate the object
failed, so the old (stale) object was returned.
TCP_REFRESH_MISS
The object was in the cache, but STALE. An If-Modified-Since request was
made and the reply contained new content.
TCP_CLIENT_REFRESH
The client issued a request with the “no-cache” pragma.
TCP_CLIENT_REFRESH_MISS
The client issued a "no-cache" pragma, or some analogous cache control
command along with the request. Thus, the cache has to refetch the object
from origin server. It is users pushing that reload-button forcingthe
proxy to check for a new copy (also triggered by selecting a bookmarkin some
browser versions).
In short, the browser forced the proxy to check for a new version
TCP_IMS_HIT
The client issued an If-Modified-Since
request and the object was in thecache and still fresh. TCP_HIT and TCP_IMS_HIT
are hits, the only difference is that in the TCP_IMS_HIT case the browser
already had an up to date version so there was no need to send the Squidcached
copy to the requestor
TCP_IMS_MISS
The client issued an If-Modified-Since request for a stale object.
TCP_SWAPFAIL
The object was believed to be in the cache, but could not be accessed.
TCP_DENIED
Access was denied for this request
“UDP_” refers to requests on the ICP port
UDP_HIT
A valid copy of the requested object was in the cache.
UDP_HIT_OBJ
Same as UDP_HIT, but the object data was small enough to be sent in the
UDP reply packet. Saves the following TCP request.
UDP_MISS
The requested object was not in the cache.
UDP_DENIED
Access was denied for this request.
UDP_INVALID
An invalid request was received.
UDP_RELOADING
The ICP request was "refused" because the cache is busy reloading its metadata.
Squid switched from a Time-To-Live based expiration model to a Refresh-Rate
model. Objects are no longer purged from the cache when they expire. Instead
of assigning TTL’s when the object enters the cache, we now check
freshness requirements when objects are requested. If an object is “fresh”
it is given directly to the client. If it is “stale” then we
make an If-Modified-Since request for it. When checking the object freshness,
we calculate these values:
AGE is how much the object has aged since it was retrieved:
AGE = NOW - OBJECT_DATE
LM_AGE is how old the object was when it was retrieved:
LM_AGE = OBJECT_DATE - LAST_MODIFIED_TIME
LM_FACTOR is the ratio of AGE to LM_AGE:
LM_FACTOR = AGE / LM_AGE
CLIENT_MAX_AGE
is the (optional) maximum object age the client will accept as taken from
the HTTP/1.1 Cache-Control request header. EXPIRES is the (optional) expiry
time from the server reply headers. These values are compared with theparameters
of the ‘refresh_pattern’ rules. The refresh parametersare:
URL regular expression
MIN_AGE
PERCENT
MAX_AGE
The URL regular expressions are checked in the order listed until a match
is found. Then this algorithm is applied for determining if an object is
fresh or stale:
if (CLIENT_MAX_AGE)
if (AGE > CLIENT_MAX_AGE)
return STALE
if (AGE <= MIN_AGE)
return FRESH
if (EXPIRES) {
if (EXPIRES <= NOW)
return STALE
else
return FRESH
}
if (AGE > MAX_AGE)
return STALE
if (LM_FACTOR < PERCENT)
return FRESH
return STALE
Note that the Max-Age in a client request takes the highest precedence.
The ‘MIN’ value should normally be set to zero since it has
higher precedence than the server’s Expires: value. But if you wish
to override the Expires: headers, you may use the MIN value.
Pool : A collection of bucket groups as appropriate to a givenclass
bucket group a group of buckets within a pool, such as the per-host
bucket group, the per-network bucket group or the aggregate bucket group
(the aggregate bucket group is actually a single bucket)
bucket an individual delay bucket represents a traffic allocation,
which is replenished at a given rate (up to a given limit) and causes traffic
to be delayed when empty
class the class of a delay pool determines how the delay is applied,
ie, whether the different client IPs are treated seperately or as a group
(or both)
class 1 a class 1 delay pool contains a single unified bucket,which
is used for all requests from hosts subject to the pool
class 2 a class 2 delay pool contains one unified bucket and 255
buckets, one for each host on an 8-bit network (IPv4 class C)
class 3 contains 255 buckets for the subnets in a 16-bit network,
and individual buckets for every host on these networks (IPv4 class B)
Squid can act as a proxy server for various Internet protocols. The most
commonly used protocol is HTTP, but the File Transfer Protocol (FTP) is
still alive and well.
FTP was written for authenticated file transfer (it requires a username
and password). To provide public access, a special account is created:the
anonymous user. When you log into an FTP server you use this as yourusername.
As a password you generally use your email address. Most browsersthesedays
automatically enter a useless email address.
It’s polite to give an address that works, though. If one of your
users abuses a site, it allows the site admin get hold of you easily.
Squid allows you to set the email address that is used with the ftp_user
tag. You should probably create a squid@yourdomain.example email
address specifically for people to contact you on.
There is another reason to enter a proper address here: some servers require
a real email address. For your proxy to log into these ftp servers youwill
have to enter a real email address here.
Squid can only bind to low numbered ports (such as port 80) if it is started
as root. Squid is normally started by your system’s rc scripts when
the machine boots. Since these scripts run as root, Squid is started as
root at bootup time.
Once Squid has been started, however, there is no need to run it as root.
Good security practice is to run programs as root only when it’sabsolutely
necessary, and for this reason Squid changes user and groupID’s once
it has bound to the incoming network port.
The cache_effective_user and cache_effective_group tags tell
Squid what ID’s to change to. The Unix security system would be useless
if it allowed all users to change their ID’s at will, so Squid only
attempts to change ID’s if the main program is started as root.
If you do not have root access to the machine, and are thus not starting
Squid as root, you can simply leave this option commented out. Squid will
then run with whatever user ID starts the actual Squid binary.
As discussed in chapter 2, this book assumes that you have created both
a squid user and a squid group on your cache machine. The above tags should
thus both be set to “squid”.
Half closed clients: The clients that shutsdown the sending side
of their TCP connections, while leaving their receiving sides open we term
it as halfclosed clients ie the clients closes while the handshaking is
in progress.
Fully closed clients: The clients and servers have shared their
acknowledgements(request and responses) before closing.
IDENT: Squid will make an RFC931/ident request for client connections if 'ident_lookup' is enabled in the config file. Currently, the ident value is only logged with the request in the access.log. It is not currentlypossible to use the ident return value for access control purposes.
URN:
SIGHUP or SIGTERM: The system signal sent to processes running
in linux OS to shutdown.
Htpasswd: It is apache type passwd ,You can use this to create
passwd for squid also. The Syntex is.
htpasswd [ -c ] passwdfile username .
Redirector: Squid now has the ability to rewrite requested URLs.
Implemented as an external process (similar to a dnsserver), Squid canbe
configured to pass every incoming URL through a 'redirector' processthat
returns either a new URL, or a blank line to indicate no change.
The redirector program is NOT a standard part of the Squid package.However
there are a couple of user-contributed redirectors in the "contrib/" directory.
Since everyone has different needs, it is up to the individual administrators
to write their own implementation. For testing, and a place to start, this
very simple Perl script can be used:
#!/usr/local/bin/perl
$|=1;
print while (<>);
The redirector program must read URLs (one per line) on standard input,
and write rewritten URLs or blank lines on standard output. Note that the
redirector program can not use buffered I/O. Squid writes.