Debugging strange http.Response Read() behavior in Golang
With Dolt, you can push
and pull
a local
MySQL compatible database to and from a remote. Remotes can be managed with the
dolt remote
CLI command and it supports many types of remotes.
You can use a separate directory as a Dolt remote, an s3 bucket, or any grpc service
that implements the ChunkStoreService
protocol buffer definition.
remotesrv
is Dolt's open-source implementation of ChunkStoreService
. It also
serves a simple HTTP file server that is used to transfer data between the
remote and client.
Earlier this week we ran into an interesting issue related to the interaction
between the Dolt CLI and remotesrv
's HTTP file server. To fix it, it took an
understanding of the HTTP/1.1
protocol and some digging in Golang source code.
In this blog, we'll discuss how Golang's net/http
package automatically sets
the Transfer-Encoding
header of a HTTP response and how that can change the
behavior of http.Response.Body
Read
calls on the client side.
A weird Dolt CLI error
This investigation started from a report from a Dolt user. They had setup
remotesrv
to host their Dolt database and were using the Dolt CLI to pull
upstream changes into a local clone. While push
worked perfectly, pull
seemed to make some progress then fail with a suspicious error:
throughput below minimum allowable
This particular error is suspicious since it reveals that the Dolt client failed
to download data from remotesrv
's HTTP file server at a minimum rate of 1024
bytes per second. Our initial assumption was that parallel downloads were
causing some kind of congestion on the download path. We were wrong. It was
revealed that this error was only occurring on large downloads, which are
serialized, so congestion was unlikely. We dug a little deeper into how the
throughput is being measured and found something surprising.
How we measured throughput
For some context, let's start with an overview of Golang's io.Reader
interface. The interface allows you to Read
bytes from some source into some
buffer b
:
func (T) Read(b []byte) (n int, err error)
As part of its contract, it guarantees that it won't read more than len(b)
bytes and that the number of bytes read into b
is always returned as n
. A particular
Read
call can return 0 bytes, 10 bytes, or even 134,232,001 bytes as long as
b
is large enough. If the reader runs out of out bytes to read, it returns an
end-of-file (EOF) error that you can test for.
When you make a HTTP call in Golang using the net/http
package, the response
body is available as an io.Reader
. You can use Read
on the reader to
retrieve the bytes of the body. With the contract of the io.Reader
in mind, we
know that we may retrieve anywhere from 0 bytes from the io.Reader
to the entire body during
any particular Read
call.
During our investigation, we saw that downloads of 134,232,001 bytes were failing to meet our minimum throughput, but the reason why was not immediately apparent. Using Wireshark, we could see that data was being transferred fast enough and that it seemed like the issue was with how the Dolt CLI measured throughput.
Here's some pseudo-code that describes how the throughput is measured:
type measurement struct {
N int
T time.Time
}
type throughputReader struct {
io.Reader
ms chan measurement
}
func (r throughputReader) Read(bs []byte) (int, error) {
n, err := r.Reader.Read(bs)
r.ms <- measurement{n, time.Now()}
return n, err
}
func ReadNWithMinThroughput(r io.Reader, n int64, min_bps int64) ([]byte, error) {
ms := make(chan measurement)
defer close(ms)
r = throughputReader{r, ms}
bytes := make([]byte, n)
go func() {
for {
select {
case _, ok := <-ms:
if !ok {
return
}
// Add sample to a window of samples.
case <-time.After(1 * time.Second):
}
// Calculate the throughput by selecting a window of samples,
// summing the sampled bytes read, and dividing by the window length. If the
// throughput is less than |min_bps|, cancel our context.
}
}()
_, err := io.ReadFull(r, bytes)
return bytes, err
}
}
The above code reveals the culprit for our issue. Note that if a singular Read
call takes a long time, no samples of throughput will arrive, and eventually our
measurement code will report a throughput of 0 bytes and throw the error. This
is further backed by the fact that small downloads completed, but larger
downloads failed consistently.
But how do we prevent these large Reads
and moreover what
causes some reads to be large and others to be short?
Let's investigate this by dissecting how a HTTP response is built on the server and how it is parsed by the client.
Writing HTTP responses
In Golang, you use http.ResponseWriter
to return data to the client. You can
use the writer to write headers and also the body, but there is a lot of
underlying logic that controls what headers are actually written and how the
body is encoded.
For example, in the http file server we never set the Content-Type
or
Transfer-Encoding
headers. We simply called Write
once with a buffer that
held the data we needed to return. But if we check the response headers with curl:
=> curl -sSL -D - http://localhost:8080/dolthub/test/53l5... -o /dev/null
HTTP/1.1 200 OK
Date: Wed, 09 Mar 2022 01:21:28 GMT
Content-Type: application/octet-stream
Transfer-Encoding: chunked
We can see that both the Content-Type
and Transfer-Encoding
headers are set!
Furthermore, Transfer-Encoding
is set to chunked
!
Here's a comment we found from net/http/server.go explaining this:
// The Life Of A Write is like this:
//
// Handler starts. No header has been sent. The handler can either
// write a header, or just start writing. Writing before sending a header
// sends an implicitly empty 200 OK header.
//
// If the handler didn't declare a Content-Length up front, we either
// go into chunking mode or, if the handler finishes running before
// the chunking buffer size, we compute a Content-Length and send that
// in the header instead.
//
// Likewise, if the handler didn't set a Content-Type, we sniff that
// from the initial chunk of output.
And this is an explanation of chunked transfer encoding from Wikipedia:
Chunked transfer encoding is a streaming data transfer mechanism available in version 1.1 of the Hypertext Transfer Protocol (HTTP). In chunked transfer encoding, the data stream is divided into a series of non-overlapping "chunks". The chunks are sent out and received independently of one another. No knowledge of the data stream outside the currently-being-processed chunk is necessary for both the sender and the receiver at any given time.
Each chunk is preceded by its size in bytes. The transmission ends when a zero-length chunk is received. The chunked keyword in the Transfer-Encoding header is used to indicate chunked transfer.
An early form of the chunked transfer encoding was proposed in 1994.[1] Chunked transfer encoding is not supported in HTTP/2, which provides its own mechanisms for data streaming.[2]
Reading HTTP responses
To read a http response's body, net/http
provides Response.Body
which as an
io.Reader
. It also has logic that hides the implementation details of HTTP.
Regardless of what transfer encoding was used, the provided io.Reader
returns
only the bytes that were originally written in the request. It automatically
"de-chunks" the chunked response.
We investigated this "de-chunking" in more detail to understand why this can cause a large Read
.
Writing and reading chunks
If you look at the implementation of chunkedWriter
you can see that every
Write
produces a new chunk regardless of its size:
// Write the contents of data as one chunk to Wire.
func (cw *chunkedWriter) Write(data []byte) (n int, err error) {
// Don't send 0-length data. It looks like EOF for chunked encoding.
if len(data) == 0 {
return 0, nil
}
if _, err = fmt.Fprintf(cw.Wire, "%x\r\n", len(data)); err != nil {
return 0, err
}
if n, err = cw.Wire.Write(data); err != nil {
return
}
if n != len(data) {
err = io.ErrShortWrite
return
}
if _, err = io.WriteString(cw.Wire, "\r\n"); err != nil {
return
}
if bw, ok := cw.Wire.(*FlushAfterChunkWriter); ok {
err = bw.Flush()
}
return
}
In remotesrv
, we first loaded the requested data into a buffer and then called
Write
once. So we send 1 huge chunk over the wire.
In chunkedReader
we see that its Read
call will Read
an entire chunk from
the wire at a time:
func (cr *chunkedReader) Read(b []uint8) (n int, err error) {
for cr.err == nil {
if cr.checkEnd {
if n > 0 && cr.r.Buffered() < 2 {
// We have some data. Return early (per the io.Reader
// contract) instead of potentially blocking while
// reading more.
break
}
if _, cr.err = io.ReadFull(cr.r, cr.buf[:2]); cr.err == nil {
if string(cr.buf[:]) != "\r\n" {
cr.err = errors.New("malformed chunked encoding")
break
}
} else {
if cr.err == io.EOF {
cr.err = io.ErrUnexpectedEOF
}
break
}
cr.checkEnd = false
}
if cr.n == 0 {
if n > 0 && !cr.chunkHeaderAvailable() {
// We've read enough. Don't potentially block
// reading a new chunk header.
break
}
cr.beginChunk()
continue
}
if len(b) == 0 {
break
}
rbuf := b
if uint64(len(rbuf)) > cr.n {
rbuf = rbuf[:cr.n]
}
var n0 int
/*
Annotation by Dhruv:
This Read call directly calls Read on |net.Conn| if |rbuf| is larger
than the underlying |bufio.Reader|'s buffer size.
*/
n0, cr.err = cr.r.Read(rbuf)
n += n0
b = b[n0:]
cr.n -= uint64(n0)
// If we're at the end of a chunk, read the next two
// bytes to verify they are "\r\n".
if cr.n == 0 && cr.err == nil {
cr.checkEnd = true
} else if cr.err == io.EOF {
cr.err = io.ErrUnexpectedEOF
}
}
return n, cr.err
}
Since every request from our HTTP file server is served and read as a single
chunk, the return time for a Read
call completely depends on the size of the
requested data. In the cases where we downloaded a large range of data
(134,232,001 bytes), those Read
calls timed out consistently.
Fixing the issue
We had two potential solutions to fix this problem. We could produce smaller
chunks by breaking up the http.ResponseWriter
Write
calls or we could
explicitly set the Content-Length
header which would bypass the chunk transfer
encoding entirely.
We decided to break up the http.ResponseWriter
Write
calls by using
io.Copy
. io.Copy
produces Write
's that are at most 32 * 1024 (32,768)
bytes. To use it, we refactored our code to provide the needed data as an
io.Reader
instead of a large buffer. Using io.Copy
is an idiomatic pattern
to pipe data between a io.Reader
and io.Writer
.
You can view the PR with these changes here.
Conclusion
In summary, we discovered that when writing a response, http.ResponseWriter
will use chunked transfer encoding if no Content-Length
is set and if the size
of the write is larger than the chunking buffer size. And correspondingly, when
we read a response, chunkReader
will try to Read
an entire chunk at a time
from the underling net.Conn
. Since remotesrv
wrote a very large chunk, the
Read
calls on the Dolt CLI consistently took too long and caused the
throughout error to be thrown. We fixed the issue by writing smaller chunks.
The net/http
package and other Golang standard libraries are a pleasure to
work with. It's easy to read the source since most of standard libraries are
written in Go itself and can be viewed on Github. Although there was little
documentation for the specific problem at hand, it only took one or two hours to
dig to the root cause. I'm personally very excited to continue working on Dolt
and deepen my knowledge of Go.
If you're excited about Go too and want to chat about any of the above, reach out to us on Discord!