-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Bad_malloc" in train_2D.py after followed the instruction. #82
Comments
@paperplane03 My guess is that you are running out of memory. Try lowering the |
It this means that if I just download the 36.7G data, it will not able to download all datas needed to train the example model? |
Met the same problem on our 4x4090 server with even batch_size = 1 in train2D.py. |
Some of the datasets are indeed incredibly large. I am looking into the issue of the data not downloading for |
@paperplane03 and @Luchixiang, what OS is each of you using? |
ubuntu x86_64 with ~100G free memory. |
@rhoadesScholar We use ubuntu x86_64. Thank you for help! |
Regarding finding empty |
Still working on the "Bad_malloc" unfortunately... |
@Luchixiang @paperplane03 Can you add 'device="cpu"' to the training script, start training, and then watch 'top' in the terminal to observe the memory usage? I'm having trouble replicating the bug consistently. |
We will do it. Do you mean that in your server the "bad malloc" task will not exist? |
Any update of it? |
Dear staff,
I update the repo and followed the instruction, but I meet bad malloc problem.
$ python train_2D.py Training device: cuda Reading csv file... Constructing datasets... Training datasets: 100%|██████████████████████████████████████████████████████████████████████████████| 231/231 [00:00<00:00, 1087.41it/s] Validation datasets: 100%|██████████████████████████████████████████████████████████████████████████████| 42/42 [00:00<00:00, 2241.66it/s] Verifying datasets... Training datasets: 100%|███████████████████████████████████████████████████████████████████████████████| 231/231 [00:00<00:00, 687.97it/s] Validation datasets: 100%|██████████████████████████████████████████████████████████████████████████████| 42/42 [00:00<00:00, 1000.40it/s] Training: 0%| | 0/1000 [00:00<?, ?it/s]terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted (core dumped)
I don't met it before. Is there some new updates leading to the error?
The text was updated successfully, but these errors were encountered: