I've been writing a web crawler which generates output in Internet Archive ARC file format. I'm used to crawlers like wget which just record the contents of HTTP requests, ARC is different, it wants the headers as well. I knew the Content-Length header, but I didn't know about the chunked transfer encoding. The body of the request is broken into chunks, it is sent as chunk size in hex, then a newline, then the chunk, then a newline. This is important for writing ARC files because I need to record the data as it comes over the wire, not after it has been decoded. My crawler is written in Python, but I couldn't use urllib or even httplib, I get more low-level and use the socket module.