Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic premature EOF #637

Open
slve opened this issue Feb 11, 2021 · 5 comments
Open

Non-deterministic premature EOF #637

slve opened this issue Feb 11, 2021 · 5 comments

Comments

@slve
Copy link

slve commented Feb 11, 2021

Story

I have a service that is

  1. listening to a stream of request bodies
  2. querying a third party http server using blazeclient
  3. forwarding the responses further down the stream

90% of the requests are fine while 10% fails with org.http4s.InvalidBodyException: Received premature EOF,
where the 10% is not tied to particular requests,
so if I retry the same requests there's 90% chance they'd pass.

Reproduction

I managed to reproduce the issue in a controlled environment.

  1. I emulate the stream of request bodies, by keep
    emmiting a single static request payload
    val body = "x".repeat(requestPayloadSize)
    at a fixed rate Stream.fixedRate

  2. query the local test server
    val req = Request[IO](POST, uri).withEntity(body)
    ...
    simpleClient.stream.flatMap(c => c.stream(req)).flatMap(_.bodyText)
    that responds with a static payload
    val response = "x".repeat(responsePayloadSize)
    ...
    case POST -> Root => Ok(response)

  3. finally I print the index of the request along with the chunk size
    .evalMap(c => IO.delay(println(s"$i ${c.size}")))

Conclusion

Based on some extended experiment I managed to find some magic numbers
for request and response payload sizes.
Below these payload sizes I can run the app for an extended period without any exceptions,
while if both request and response sizes reach these thresholds
the client will eventually throw an EOF exception.

  // Numbers below vary on different computers
  // In my case, if the request payload size is 32603 or greater
  //  AND response payload size is 81161 or greater
  //  then we get EOF exception in some but not all cases
  // If however either of these payload sizes is lower then
  //  EOF exception doesn't occur, even if running for an extended period

Notes

You can find the test project at https://github.com/slve/http4s-eof,
there the only scala source in
https://github.com/slve/http4s-eof/blob/master/src/main/scala/Http4sEof.scala.

Using fs2 2.5.0, http4s 0.21.18. https://github.com/slve/http4s-eof/blob/master/build.sbt

The server part is only there to aid the testing,
but regardless of what server you'd run your test with,
the client will eventually drop an EOF exception in a short period.

@rossabaker
Copy link
Member

Thanks for the reproducible case. This is an excellent report.

I sent a PR: slve/http4s-eof#1. I think this clears up by sharing one client for the app instead of creating a client per request.

@slve
Copy link
Author

slve commented Feb 12, 2021

Thanks for the quick response @rossabaker!

I've tested it side-by-side with my original version where I've inadvertently created a client on each request and it definitely made a change.
I merged your PR.

I also quickly further tested your fixed version, only increasing the payload sizes from ~32kB and ~81kB both to 500kB and managed to get the same EOF error message, will get back to the topic in a bit.

EDIT: I've fine tuned the thresholds and opened a PR with my changes https://github.com/rossabaker/http4s-eof/pull/1/files?w=1

@rossabaker
Copy link
Member

Okay, that was the only misuse I saw reviewing it last night. Going to give this the bug label and try to chase it more this weekend.

@rossabaker rossabaker added the bug label Feb 12, 2021
@slve
Copy link
Author

slve commented Feb 12, 2021

Awesome, thank you @rossabaker.
Just to note, I've had other two experiments in different branches,
one is using http4s/jdk-http-client while the other is using softwaremill/sttp
and ran into similar issues.

@eugene-cheverda
Copy link

Hi @rossabaker

We're facing EOF error in one of our services and I'm using the project provided by @slve to reproduce the issue. Please see the findings below:

  1. With blaze client, if request size is bigger than 65397 bytes and response is about 1Mb first request fails with EOF/Broken Pipe/Connection reset by peer regardless of time the app was running, it happens almost immediately. Number 65397 is 65535 - default request headers length. I'm testing it with 70000 bytes request size and 1000000 response.
  2. With turned on tracing logs and debugging I was able to figure out that request itself streams fine, but exception occurs when streaming back response. See point 3 below.
  3. It seems like there is some sort of race in Http1Stage.scala lines 219-230. If I set a breakpoint on line currentBuffer = BufferTools.concatBuffers(currentBuffer, b) simulating delay then test passes. Without breakpoint fails almost immediately into cb(eofCondition()). Prior falling into eofCondition Http1Stage.drainBody function is called and logs HTTP body not read to completion. Dropping connection.
channelRead().onComplete {
  case Success(b) =>
    currentBuffer = BufferTools.concatBuffers(currentBuffer, b)
    go()

  case Failure(Command.EOF) =>
    cb(eofCondition())

  case Failure(t) =>
    logger.error(t)("Unexpected error reading body.")
    cb(Either.left(t))
}

Could you please take a look at the issue again and provide some updates or estimates on how soon it could be fixed?

Thanks in advance!

@rossabaker rossabaker transferred this issue from http4s/http4s May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants