Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak #50

Open
lidortal opened this issue May 14, 2021 · 8 comments
Open

Possible memory leak #50

lidortal opened this issue May 14, 2021 · 8 comments

Comments

@lidortal
Copy link

I use this package in my application and I observe that the memory consumption is increased over time.
I added goleak to search for leaks in one of the existing tests and found the following:

Goroutine 6 in state select, with github.com/gammazero/workerpool.(*WorkerPool).dispatch on top of the stack:
        goroutine 6 [select]:
        github.com/gammazero/workerpool.(*WorkerPool).dispatch(0x14000108080)
        	/Users/myuser/ws/workerpool/workerpool.go:184 +0x180
        created by github.com/gammazero/workerpool.New
        	/Users/myuser/ws/workerpool/workerpool.go:37 +0x108
        ]
@ALNBAND
Copy link

ALNBAND commented May 15, 2021

We are working with this library in production for few applications, and it works as expected; however, we notice this issue with dispatch/new a few weeks ago with a high load app running on AKS., When our application gets some restarts (after 10-14 days) due to high memory consumption, I suggest you upgrade to the latest version, which may help; however, we still saw that the graph is growing slowly. We are working on several solutions that fix this issue; however, not using workerpool directly ...

@gammazero
Copy link
Owner

When a workerpool is created, that goroutine (dispatch()) remains running for the life of the workerpool. It continues to accept tasks to feed to free workers, or queue until workers are available. The dispatch() goroutine ends only after calling Stop() or StopWait(). This does not appear to be a leak, but rather a workerpool that is still running, because it has not been shutdown.

As part of your testing, it may be useful to call Stop() and check that it returns, since it waits for all currently executing tasks to complete. If it does not return, then that indicates one of the workers is stuck in a goroutine that is not completing. If it does complete, then it may help narrow down where the leak is.

@lidortal
Copy link
Author

I used on of the existing tests and indeed there is a call to Stop()

@gammazero
Copy link
Owner

I have identified the issue, will provide a resolution ASAP. Thank you for reporting this.

@gammazero
Copy link
Owner

@lidortal The issue appears to be that goleak is not compatible with t.parallel and reports leaks when the two are used together. This is a known issue. I have updated the tests to no use t.parallel and instrumented each with a goleak check. After that, I was not able to find evidence of any leak. This change is in commit 1adbd90

Please confirm whether or not this fixes the issue you are seeing. If this resolves the issue, then close this issue and I will bump the version number.

@lidortal
Copy link
Author

@gammazero I observe a leak on my productive application, w/o using t.parallel:

         Goroutine 7 in state chan receive, with github.com/gammazero/workerpool.worker on top of the stack:
        goroutine 7 [chan receive]:
        github.com/gammazero/workerpool.worker(0x14000182720)
        	/Users/myuser/go/pkg/mod/github.com/gammazero/[email protected]/workerpool.go:239 +0x40
        created by github.com/gammazero/workerpool.startWorker
        	/Users/myuser/go/pkg/mod/github.com/gammazero/[email protected]/workerpool.go:234 +0x48

I wanted to reproduce it using one of the existing tests, please let me know if there is another way to isolate it.

@gammazero
Copy link
Owner

That is exactly the area I thought might be an issue, but could not reproduce it -- fix ariving shortly. Thank you very much for the investigation.

@gammazero
Copy link
Owner

This is now fixed in commit 85cc841

The issue what that there was a chance that a worker goroutine could still be starting after Stop() finished sending the shutdown signal to all workers. Stop could then exit before the worker was shutdown, leading goleak to detect an existing goroutine after Stop() returned.

The good news is that this was not an actual leak, since the stop signal was guaranteed to reach the worker and the worker would always exit. It was also impossible for a worker to be executing any work when Stop() exited, because the stop signal was sent on an unbuffered channel, and any worker that read from that channel would have had to have completed any previous work, and of course Stop() would have to wait for each worker to read its stop signal from the unbuffered channel.

Given the above guarantees, the only value of the fix in commit 85cc841 is to ensure that goleak and similar tests pass, and I am not sure if workerpool should contain any code that is strictly for the purpose of making certain tests pass. Maybe documentation is the better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants