11 January, 2007

Fun with HTTP

The HTTP protocol specification can be a bit daunting and I was curious what the data in a POST request looks like. So, just for fun, I dumped out a little Ruby WEBrick servlet program that returns the contents of any HTTP GET or POST requests it receives. It looks like this:

#!/usr/bin/env ruby
#
#  Created by Toby Tripp on 2007-01-11.

require 'webrick'

class PostDumper <
  WEBrick::HTTPServlet::AbstractServlet
  
  # Reload file for each request, instantly
  # updating the server with code changes 
  # without needing a restart.
  #
  def PostDumper.get_instance( config, *options )
    load __FILE__
    PostDumper.new config, *options
  end
  
  def do_GET( request, response )
    response.status = 200
    response['Content-Type'] = "text/plain"
    response.body = dump_request( request )
  end
  
  def do_POST( request, response )
    response.status = 200
    response['Content-Type'] = "text/plain"
    response.body = dump_request( request )
    response.body << request.body
  end
  
  def dump_request( request )
    request.request_line << "\r\n" <<
      request.raw_header.join( "" ) << "\r\n"
  end
end

if __FILE__ == $0
  port = (ARGV[0] || 2000)
  server = WEBrick::HTTPServer.new(
     :Port => port,
     :DocumentRoot => File.join( Dir.pwd, "/html" )
   )
  server.mount "/dump", PostDumper
  trap( "INT" ) { server.shutdown }
  server.start
end

If I run the above and point my browser at http://localhost:2000/dump I get:

GET /dump HTTP/1.1

Host: localhost:2000
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.1) Gecko/20061026 BonEcho/2.0
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

I've also got a little html form that points to the same URL for its action. Submitting that form gives me:

POST /dump HTTP/1.1

Host: localhost:2000
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.1) Gecko/20061026 BonEcho/2.0
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost:2000/form.html
Content-Type: application/x-www-form-urlencoded
Content-Length: 85

submit=Submit&text_input=Single+line+text&file_input=&textarea=Multi-line+text+input.

There! It turns out that POST data is just key/value pairs that are url-encoded. Each pair is separated by an ampersand.

The response from the server can also be instructive. If I use telnet to generate a GET request:

~$ telnet localhost 2000
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.1

HTTP/1.1 200 OK 
Last-Modified: Thu, 11 Jan 2007 16:36:39 GMT
Connection: Keep-Alive
Date: Thu, 11 Jan 2007 20:09:56 GMT
Content-Type: text/html
Etag: 2a8766-169-45a66797
Server: WEBrick/1.3.1 (Ruby/1.8.2/2004-12-25)
Content-Length: 361

[...]

There's the response from the server. One thing that this program can't show you concerning the HTTP protocol: lines are terminated with the '\r\n' character sequence. This includes the blank lines, which are important to the protocol. A blank line separates the header (including the request line) from the content.

There you have it: a WEBrick servlet in less than 50 lines of code. Did I miss anything?

No comments: