In this assignment, you will implement a web proxy that passes requests and data between multiple web clients and web servers, concurrently. This will give you a chance to get to know one of the most popular application protocols on the Internet -- the Hypertext Transfer Protocol (HTTP) -- and give you an introduction to the Berkeley sockets API. When you're done with the assignment, you should be able to configure your web browser to use your personal proxy server as a web proxy.
The Hypertext Transfer Protocol (HTTP) is the protocol used for communication on this web: it defines how your web browser requests resources from a web server and how the server responds. For simplicity, in this assignment, we will be dealing only with version 1.0 of the HTTP protocol, defined in detail in RFC 1945. You may refer to that RFC while completing this assignment, but our instructions should be self-contained.
HTTP communications happen in the form of transactions; a transaction consists of a client sending a request to a server and then reading the response. Request and response messages share a common basic format:
The initial line and header lines are each followed by a "carriage-return line-feed" (\r\n) signifying the
end-of-line.
For most common HTTP transactions, the protocol boils down to a relatively simple series of steps (important sections of RFC 1945 are in parenthesis):
It's fairly easy to see this process in action without using a web browser. From a Unix prompt, type:
This opens a TCP connection to the server at www.yahoo.com listening on port 80 (the default HTTP port). You should see something like this:
type the following:
and hit enter twice. You should see something like the following:
There may be some additional pieces of header information as well- setting cookies, instructions to the browser or proxy on caching behavior, etc. What you are seeing is exactly what your web browser sees when it goes to the Yahoo home page: the HTTP status line, the header fields, and finally the HTTP message body- consisting of the HTML that your browser interprets to create a web page. You may notice here that the server responds with HTTP 1.1 even though you requested 1.0. Some web servers refuse to serve HTTP 1.0 content.
Ordinarily, HTTP is a client-server protocol. The client (usually your web browser) communicates directly with the server (the web server software). However, in some circumstances it may be useful to introduce an intermediate entity called a proxy. Conceptually, the proxy sits between the client and the server. In the simplest case, instead of sending requests directly to the server the client sends all its requests to the proxy. The proxy then opens a connection to the server, and passes on the client's request. The proxy receives the reply from the server, and then sends that reply back to the client. Notice that the proxy is essentially acting like both a HTTP client (to the remote server) and a HTTP server (to the initial client).
Why use a proxy? There are a few possible reasons:
Links:
Your task is to build a web proxy capable of accepting HTTP requests, forwarding requests to remote (origin)
servers, and returning response data to a client. The proxy MUST handle concurrent requests
efficiently by using the epoll()
system call to multiplex I/O operations. This will allow the
proxy to manage multiple connections simultaneously without creating a new process for each client request.
You will only be responsible for implementing the GET method.
If the proxy encounters any errors while processing a request (e.g., issues with connecting to the origin server or internal logic errors), it should return a "500 Internal Server Error" response. Additionally, if the client sends a malformed request that cannot be parsed, the proxy should return a "400 Bad Request" response. All other request methods received by the proxy should elicit a "501 Not Implemented" error (see RFC 1945 section 9.5 - Server Error).
This assignment can be completed in either C or C++. It should compile and run (using g++) without errors or
warnings from the FC 010 cluster, producing a binary called proxy
that takes as its first
argument a
port to listen from. Don't use a hard-coded port number.
You shouldn't assume that your server will be running on a particular IP address, or that clients will be coming from a pre-determined IP.
When your proxy starts, the first thing that it will need to do is establish a socket connection that it can use to listen for incoming connections. Your proxy should listen on the port specified from the command line and wait for incoming client connections.
Once a client has connected, the proxy should read data from the client and then check for a properly-formatted HTTP request -- but don't worry, we have provided you with libraries that parse the HTTP request lines and headers. Specifically, you will use our libraries to ensure that the proxy receives a request that contains a valid request line:
All other headers just need to be properly formatted: In this assignment, client requests to the proxy must be in their absolute URI form (see RFC 1945, Section 5.1.2), e.g., Your browser will send absolute URI if properly configured to explicitly use a proxy (as opposed to a transparent on-path proxies that some ISPs deploy, unbeknownst to their users). On the other form, your proxy should issue requests to the webserver properly specifying relative URLs, e.g., An invalid request from the client should be answered with an appropriate error code, i.e. "Bad Request" (400) or "Internal Server Error" (500), and "Not Implemented" (501) for valid HTTP methods other than GET. Similarly, if headers are not properly formatted for parsing, your client should also generate a type-400 message.We have provided a parsing library to do string parsing on the header of the request. This library is in proxy_parse.[c|h] in the skeleton code. The library can parse the request into a structure called ParsedRequest which has fields for things like the host name (domain name) and the port. It also parses the custom headers into a set of ParsedHeader structs which each contain a key and value corresponding to the header. You can lookup headers by the key and modify them. The library can also recompile the headers into a string given the information in the structs.
More details as well as sample usage is available in proxy_parse.h, as well as example code on how to use the library. This library can also be used to verify that the headers are in the correct format since the parsing functions return error codes if this is not the case.
Once the proxy sees a valid HTTP request, it will need to parse the requested URL. The proxy needs at least
three
pieces of information: the requested host and port, and the requested path. See the URL (7)
manual
page for more info. You will need to parse the absolute URL specified in the request line using the given
You
can use the parsing library to help you. If the hostname indicated in the absolute URL does not have
a port
specified, you should use the default HTTP port 80.
Once the proxy has parsed the URL, it can make a connection to the requested host (using the appropriate
remote
port, or the default of 80 if none is specified) and send the HTTP request for the appropriate resource. The
proxy
should always send the request in the relative URL + Host header format regardless of how the request was
received
from the client:
Accept from client:
When sending requests to the remote server via proxy server, you should change the User-Agent header to "proxy309/1.0" before forwarding the request to the server.
Before sending the request to the remote server: After changing the User-Agent header:The remote server used for grading will check the specified User-Agent and modify it to x-user-agent, which will be included in the HTTP response.
Response example:After the response from the remote server is received, the proxy should send the response message as-is to the client via the appropriate socket. To be strict, the proxy would be required to ensure a Connection: close is present in the server's response to let the client decide if it should close it's end of the connection after receiving the response. However, checking this is not required in this assignment for the following reasons. First, a well-behaving server would respond with a Connection: close anyway given that we ensure that we sent the server a close token. Second, we configure Firefox to always send a Connection: close by setting keepalive to false. Finally, we wanted to simplify the assignment so you wouldn't have to parse the server response.
Run your client with the following command:
, where
port
is the port number that the proxy should listen
on. As
a basic test of functionality, try requesting a page using telnet:
If your proxy is working correctly, the headers and HTML of the Google homepage should be displayed on your
terminal screen. Notice here that we request the absolute URL (http://www.google.com/
) instead
of
just the relative URL (/
). A good sanity check of proxy behavior would be to compare the HTTP
response (headers and body) obtained via your proxy with the response from a direct telnet connection to the
remote server. Additionally, try requesting a page using telnet concurrently from two different shells.
For a slightly more complex test, you can configure your web browser to use your proxy server as its web proxy. See the section below for details.
An example HTTP server is set up at teemo.kaist.ac.kr:12345, where you can connect and check sample HTTP responses.
You can test your proxy by connecting to the test server using the following command:
After sending the request, you should receive a response from the test server. The response will include the User-Agent header that you sent in the request.
In order to build your proxy you will need to learn and become comfortable programming sockets. The Berkeley sockets library is the standard method of creating network systems on Unix. There are a number of functions that you will need to use for this assignment:
You can find the details of these functions in the Unix man
pages (most of them are in section
2)
and in the Stevens Unix Network Programming book, particularly chapters 3 and 4. Other sections you
may
want to browse include the client-server example system in chapter 5 (you will need to write both client and
server code for this assignment) and the name and address conversion functions in chapter 9.
epoll
, select
, and related system calls.
You can find the details of these functions in the Unix man
pages:
man 7 epoll
man 2 select
Links:
Use KAIST
KLMS to submit your assignments. Your submission should
be one gzipped tar file whose name is
YourStudentID_assig3.tar.gz
For example, if your student ID is 20191234,
please name the file as
20191234_assign3.tar.gz
Create a local directory named 'YourStudentID_assign3
' and place all your files in it. Then,
tar
your submission file.
Please refer here for how to archive your
assignment.
Your submission need to include the following files:
Makefile
to build your proxy. Running make
should produce an executable
named proxy
, and make clean
should remove all build artifacts.proxy_parse.c
and proxy_parse.h
files for handling HTTP request parsing.Your submission file should look like this: