CSCi 5131 -- Lab 1: Web Proxy
Due: September 30 in class
Introduction:
Web proxies sit between
the client (normally a browser) and a web server performing many
useful roles such as caching, filtering, etc. They pass requests from
the browser through to the server (if there is a cache miss), and
responses from the server back to the browser. In this lab, you will
implement a simple Web proxy that speaks HTTP on one end (to a web
server) and interacts using a simple message-based interface to/from a
client. In the lab, you will apply Java network programming principles
discussed in class, implement network protocols including a small
portion of HTTP, multithreading, and develop a client-server
architecture (in some sesnse, your proxy is both a client and a
server). We will also suggest some extra-credit options should you
require more challenges (though points will be minimal).
A simplified proxy architecture is shown in the figure
below.

1) The Web proxy reads a request from a browser and processes the request
according to its filter policy.
2) The Web proxy forwards the modified request to the web server.
3) The Web server sends the response back to the web proxy.
4) The Web proxy processes the response and forwards it to the browser.
Details:
You will program your proxy
in Java (not PERL) and endow it with the ability to gather simple statistics on
the traffic flowing through it, filter requests based on some simple
criteria, support multiple concurrent connections, and cache documents.
You will also evaluate the performance of
your proxy. We will give you some skeleton code for the proxy that you may use or ignore.
Either way, you will implement a full-fledged web proxy as specified below.
This lab should be done in a group of 2 or more.
See the course overview and intro slides for acceptable bounds of
inter-group interaction.
Part 1: Build a simple web proxy:
A simple web client code (HttpClient.java)
is given. When started, three arguments are given on command line as
follows(Proxy host name, proxy port number, URL). You can also test with a simple telnet session to the proxy (the telnet client replaces your Java client). However, to gather performance data it will be easier to use your Java client since you can insert timers.
>> HttpClient.java tera.cs.umn.edu 8887
www-users.itlabs.umn.edu/classes/Fall-2003/csci5131/test.txt
The web client connects to the web proxy and send
the request(third argument) using socket. Your proxy accepts this request and
extracts the web server URL. After connecting to the web server, it generates
HTTP Request message and passes it to the web server (see HTTP material discussed in class and/or on the website). When the web proxy gets
the response from the web server, it forwards the response to the client.
Part 2:
Implement a simple caching
scheme
Requested files can be temporarily stored at the
machine where the web proxy is running. When the web proxy receives a request,
it checks if the requested file is stored at the proxy disk cache. If it is, the web
proxy returns the stored file. If it isn't, the web proxy forwards the request
to the web server and forwards the response from the web server to the client.
Before forwarding the response, it stores the returned file from the web server.
If there is no room in the cache, the web proxy deletes an old file from the
cache using a strategy of your choice, e.g. the LRU (Least Recently Used) strategy. Use an in-memory hash table that hashes the URL to the table entry - the entry can contain residence info and other fields (up to you). It will enable the proxy to quickly determine if the file is there w/o going to the OS. Such a structure could also enable in-memory caching. For files below a certain size, put them in an in-memory file cache. The disk file cache size and the in-memory file cache size should be
given as an argument when the proxy is started.
Part 3: Implement a multi-threaded web proxy handling multiple
connections
The first step is implementing a web
proxy that can handle multiple connections from clients. The given skeleton code
(HttpProxy.java)
is a single-threaded process that can handle only one connection at a time. This
may be not a big problem if there are not too many requests from clients, but is
unacceptable in a real proxy that might have hundreds or thousands of clients
contacting it simultaneously. In order to prepare your proxy for the real world
situation, you need to modify the skeleton code to handle more than one
connection at a time using Java threads. Be careful: threads may be sharing the caching structures implemented in Part 2!
Part 4: Performance Evaluation
Evaluate the performance of
your proxy. Devise experiments to examine the benefits of caching and
multithreading. You should present performance as seen from the client
as a function of number of concurrent connections and size of files
retrieved. For caching experiments, you may want to consider both
"local" web servers and more "distant" ones. Submit a short
description of your experimental setup and performance results (tables
or graphs are fine).
Part 5: Acquire statistics
Your proxy should keep
statistics on requests that go through the proxy. Again, be careful about the operations of concurrent threads. Your proxy has to open a log
file and save the statistics in the following format:
Date
:: ClientHostName :: URL :: FileName :: MIME_Type
:: Size :: Status
Tue 28
September 2003 12:45:00 :: 128.101.35.159 ::
http://www-users.itlabs.umn.edu/classes/Fall-2003/csci5131/ ::
test.txt :: text/plain :: 123 :: Allowed
Date: Date when the request is
received
ClientHostName:
Client host name that issues a request
URL: requested URL
FileName: requested file
name
MIME Type: MIME type
of requested file
Size:
the size of file sent back to the client
Status: Allowed/Denied
Grading Criteria
1) Basic Web proxy (10 points)
1) Multiple
connections (25 points)
2) Statistics (20 points)
3)
Caching (35 points)
4) Other
Criteria: quality of the solution, including cleanliness of the code,
documentation provided, and examples of the program in operation
demonstrating all features (10 points).
Submission
You have to submit all files
related to your full-fledged web proxy. In addition, you need to
submit README file where you give an explanation about how to test
your program. This file should also contain the file names
submitted. Extra Credit (5 points each)
There is no partial credits here. Either you have something working up to the level described or not ... The README must also indicate how we can run/test the extra credit options.
Filtering by type of file / Contents(body) of file
Your proxy
should be able to deny or allow access to certain files according to
its MIME type and the content of file. For example your proxy may want
to prevent any picture files (jpg or gif) from going through the
connection. Your proxy should be able to block accesses to files
containing certain key words (for example, "top secret", "proposal",
or etc). For this purpose, you should keep a file (filter.conf) that
contains the list of MIME types that should be refused access and the
list of key words that should be refused access if they appear in the
requested file. When your proxy is started, it should read this
configuration file and filter requests according to this
configuration. The format of "filter.conf" is up to you.
Cooperative Caching
Extend your caching scheme to allow a "web" of proxies to be
used or shared. There are many ways in which proxy caches can cooperate - they can be networked in a hierarchical topology such that if one proxy does not hold a document in cache, another can tried. Proxy caches can learn about the existence of documents in other caches and stores links to those caches, etc. Investigate a cooperative caching scheme from the literature (do a google search on those keywords) and implement the scheme. See this interesting paper for limits on the benefits of such caching schemes.