-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential memory leak during inference? #2081
Comments
O wow, that is a lot of GPU memory. With your sleap environment activated, can you let us know the output of the command Thanks! Elizabeth |
With the first screen shot you sent it looks like it is trying to use the CPU. With the second screen shot I can partially see that it is using GPU 0, which is the one we want to use. Do you mind just copying and pasting the entire command and output from the terminal instead of the screenshots? Thanks! |
Hi @eberrigan , thanks for helping me work on this puzzle. Here's a zipped file with the models (centroid and centered instance), a demo video to run inference on, and my logs when I run sleap-track. You can see that even for a short video, when I run sleap-track, the GPU's dedicated memory is relatively unused, but the GPU swap memory keeps increasing until the rig is out of RAM. At that point, in the logs, you can see that something changes (stalls at 57% completed, when all 128 GB of ram get saturated), and then something gets adjusted and it completes. Also it gets stuck at 100% with the green bar for several minutes, with all 24 of my cores running at 50%, not sure what that means... Thanks! Zipped file: https://drive.google.com/file/d/1NpfDJHKSh9Sv_ycMrNLf5lpOCn9giwQI/view?usp=sharing |
Hey @olinesn is this a single animal experiment? I noticed you have Let's just go ahead and do inference without tracking, if that is the case. You can set the |
Hi @eberrigan , Ok this seems to be running without complaint, that's strange that tracking would be causing this problem. Thanks for the suggestion! A fraction of this dataset is two animals, so I'm going to need to do tracking eventually. Is there a better way that I can set up the flow tracking? Does this behavior indicate a memory leak issue in sleap? Thanks, |
You should be able to do tracking but there isn't a reason to when there is only one animal. I believe the issue was with the inference not knowing how many animals are in the new data, so that the shape of the tensor changes, causing retracing tensorflow/tensorflow#34025. This might not be an issue when using the bottom-up method, which doesn't rely on the centroid model. If you have some data with different numbers of animals you might get better results running inference separately and specifying the number of animals per dataset using |
Thanks ok that's helpful to understand. Sometimes I place one mouse in the box, and sometimes I place two. Would you advice some sample syntax for the sleap-track command when there are two animals in the box? The logic makes sense of the tensor changing size, but I want to make sure I nail the syntax the way you're recommending. |
I see. Can you separate the videos so that when there is only one animal you run inference with It will also improve tracking a lot since if everything is one video and tracking is run, when an animal reappears, a new track will be made, so I expect that if you are swapping animals or removing and replacing animals, you may end up with a lot of tracks at the end the video. Do you have some sort of pipeline for dealing with that? |
Sorry I got slammed at the end of last week. Thanks for your thoughts.
Yes it's perfectly doable for me to pre-determine the number of animals for the majority of these experiments. What would a reasonable sleap-track command look like? I just want to make sure I'm interpreting this correctly:
are "-n" and "--max_instances" synonymous, or do you have to use both to get this effect? I've never used "-n."
We actually have to think about this a lot because of reflections. Sometimes if there are 3 animals plus reflections, one of the reflections can get picked up as an instance, and occasionally has a higher score than the 3 real animals. If we set max instances to 3, then we drop the instance of a real animal on that frame, so usually we're setting it to n+1 or n+2. |
Hi @olinesn, Yes, Thanks, |
@eberrigan Unfortunately this doesn't seem to solve the problem. Are you able to try giving it a go? If you can download the zipped file and try running sleap-track, I'm curious to see if it runs for you or crashes. |
Bug description
When running inference, the GPU starts using system memory (as seen in the task manager) until there's none left, and then inference crashes. You can see that the GPU's "Shared GPU memory usage" climbs, but the on-GPU memory is hardly used at all.
Expected behaviour
Inference completes without issue, the way that it does for shorter videos.
Actual behaviour
Your personal set up
Threadripper PRO 24 core
128 GB memory
nVidia RTX 6000
Windows 11
1.3.3
Environment packages
Logs
Screenshots
The text was updated successfully, but these errors were encountered: