-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training #19
Comments
Hi, I created this split only for students to conduct experiments under limited computing resources. I did not do experiments using the sdf_hand_mini and sdf_obj_mini. |
Thank you for your reply. DeepSdf - INFO - time used: 85.93944382667542 |
I think this line is the problem. |
How do I train with sdf_hand_mini and sdf_obj_mini that you uploaded?
I think there is a .npz file that doesn't exist because I put it in mini version.
(alignsdf) MS-7B23:~/mount4t/AlignSDF$ CUDA_VISIBLE_DEVICES=0 bash dist_train.sh 4 6666 -e experiments/obman/30k_1e2d_mlp5.json
do not support renderer in this machine
DeepSdf - INFO - Added key: store_based_barrier_key:1 to store for rank: 0
DeepSdf - INFO - Training in distributed mode, 1 GPU per process. Process 0, total 1.
DeepSdf - INFO - Experiment description:
3D hand reconstruction on the mini obman dataset.
Hand branch: True
Object branch: True
Mano branch: False
Depth branch: False
Classifier Weight: 0
Penetration Loss: False
Penetration Loss Weight: 0
Additional Loss start at epoch: 1201
Contact Loss: False
Contact Loss Weight: 0
Contact Loss Sigma (m): 0.005
Independent Obj Scale: False
Ignore other: False
nb_label_class: 6
Image encoder, the branch has latent size 256
DeepSdf - INFO - Finish constructing the dataset
DeepSdf - INFO - start_epoch:1, current_rank:0
DeepSdf - INFO - epoch:1, current_rank:0
Traceback (most recent call last):
File "train.py", line 715, in
main_function(exp_cfg, args.continue_from, args.local_rank, args.opt_level, args.slurm)
File "train.py", line 465, in main_function
for i, (input_iter, label_iter, meta_iter) in enumerate(sdf_loader):
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gaeun/mount4t/AlignSDF/utils/data.py", line 162, in getitem
hand_samples, hand_labels = unpack_sdf_samples(self.data_source, data_key, num_sample, hand=True, clamp=self.clamp, filter_dist=self.filter_dist)
File "/home/gaeun/mount4t/AlignSDF/utils/sdf_utils.py", line 172, in unpack_sdf_samples
npz = np.load(npz_path)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/numpy/lib/npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'data/obman/train/sdf_hand/00018168.npz'
Killing subprocess 12576
Traceback (most recent call last):
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/gaeun/anaconda3/envs/alignsdf/bin/python', '-u', 'train.py', '--local_rank=0', '-e', 'experiments/obman/30k_1e2d_mlp5.json']' returned non-zero exit status 1.
The text was updated successfully, but these errors were encountered: