You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I enjoyed your work. Thank you for sharing the code. I'm facing some issues training the model. I've already downloaded and placed the data folder. I believe the issue is with parameters, I couldn't notice anything special for the docker run to pass the parameters (possibly configs) to the runner. So, I modified to command to work outside of the docker environment. Here is the command:
If I remove the unrecognized arguments, it passes the error but then I'm getting error because of the PyTorch-geometric version as follows: RuntimeError: The 'data' object was created by an older version of PyG. If this error occurred while loading an already existing dataset, remove the 'processed/' directory in the dataset's root folder and try again.
I tried the suggested 'fix' and removed the processed folder from CASP{i} where i in 9..13. Then I get the following error: ValueError: With n_samples=0, test_size=None and train_size=0.85, the resulting train set will be empty. Adjust any of the aforementioned parameters.
This is probably because the script directly tries to read processed files rather than checking/generating them again. Hence, n_samples=0
Edit: Solved by writing a custom script to load graphs with older version (1.7.2) of torch.geometric.data.Data. Then I convert them to a dictionary using to_dict() method. Then, with the newer torch_geometric (2.0.1) version I load them back with from_dict() method. This fixes the preprocessed data loading problem for recent versions of torch geometric.
But unfortunately, even after the fixes mentioned above the results are nowhere near the reported results. After a few runs, best RMSE I got was 0.165 where it was reported as 0.130. Please note that I trained using the provided train.yaml file. Although the hardware is different, the differences between reported and reproduced results are very high. I would like to contribute to improving reproducibility but as of right now I hit a roadblock. I hope the authors can provide some clarification to this.
Another problem I've noticed is train.raw.yml is not runnable, it returns the following error: AttributeError: 'NoneType' object has no attribute 'out_edge_feats'
The text was updated successfully, but these errors were encountered:
Hello, I enjoyed your work. Thank you for sharing the code. I'm facing some issues training the model. I've already downloaded and placed the data folder. I believe the issue is with parameters, I couldn't notice anything special for the docker run to pass the parameters (possibly configs) to the runner. So, I modified to command to work outside of the docker environment. Here is the command:
python -m src.graphqa.train config/train.yaml --model config/model.yaml --session config/session.yaml --in_memory=yes
and here is the error message:
usage: train.py [-h] [--logger [LOGGER]] [--checkpoint_callback [CHECKPOINT_CALLBACK]] [--default_root_dir DEFAULT_ROOT_DIR] [--gradient_clip_val GRADIENT_CLIP_VAL] [--process_position PROCESS_POSITION] [--num_nodes NUM_NODES] [--num_processes NUM_PROCESSES] [--gpus GPUS] [--auto_select_gpus [AUTO_SELECT_GPUS]] [--tpu_cores TPU_CORES] [--log_gpu_memory LOG_GPU_MEMORY] [--progress_bar_refresh_rate PROGRESS_BAR_REFRESH_RATE] [--overfit_batches OVERFIT_BATCHES] [--track_grad_norm TRACK_GRAD_NORM] [--check_val_every_n_epoch CHECK_VAL_EVERY_N_EPOCH] [--fast_dev_run [FAST_DEV_RUN]] [--accumulate_grad_batches ACCUMULATE_GRAD_BATCHES] [--max_epochs MAX_EPOCHS] [--min_epochs MIN_EPOCHS] [--max_steps MAX_STEPS] [--min_steps MIN_STEPS] [--limit_train_batches LIMIT_TRAIN_BATCHES] [--limit_val_batches LIMIT_VAL_BATCHES] [--limit_test_batches LIMIT_TEST_BATCHES] [--limit_predict_batches LIMIT_PREDICT_BATCHES] [--val_check_interval VAL_CHECK_INTERVAL] [--flush_logs_every_n_steps FLUSH_LOGS_EVERY_N_STEPS] [--log_every_n_steps LOG_EVERY_N_STEPS] [--accelerator ACCELERATOR] [--sync_batchnorm [SYNC_BATCHNORM]] [--precision PRECISION] [--weights_summary WEIGHTS_SUMMARY] [--weights_save_path WEIGHTS_SAVE_PATH] [--num_sanity_val_steps NUM_SANITY_VAL_STEPS] [--truncated_bptt_steps TRUNCATED_BPTT_STEPS] [--resume_from_checkpoint RESUME_FROM_CHECKPOINT] [--profiler [PROFILER]] [--benchmark [BENCHMARK]] [--deterministic [DETERMINISTIC]] [--reload_dataloaders_every_epoch [RELOAD_DATALOADERS_EVERY_EPOCH]] [--auto_lr_find [AUTO_LR_FIND]] [--replace_sampler_ddp [REPLACE_SAMPLER_DDP]] [--terminate_on_nan [TERMINATE_ON_NAN]] [--auto_scale_batch_size [AUTO_SCALE_BATCH_SIZE]] [--prepare_data_per_node [PREPARE_DATA_PER_NODE]] [--plugins PLUGINS] [--amp_backend AMP_BACKEND] [--amp_level AMP_LEVEL] [--distributed_backend DISTRIBUTED_BACKEND] [--automatic_optimization [AUTOMATIC_OPTIMIZATION]] [--move_metrics_to_cpu [MOVE_METRICS_TO_CPU]] [--enable_pl_optimizer [ENABLE_PL_OPTIMIZER]] [--multiple_trainloader_mode MULTIPLE_TRAINLOADER_MODE] [--stochastic_weight_avg [STOCHASTIC_WEIGHT_AVG]] [--resume RESUME] [rest [rest ...]] train.py: error: unrecognized arguments: --model config/model.yaml --session config/session.yaml --in_memory=yes
If I remove the unrecognized arguments, it passes the error but then I'm getting error because of the PyTorch-geometric version as follows:
RuntimeError: The 'data' object was created by an older version of PyG. If this error occurred while loading an already existing dataset, remove the 'processed/' directory in the dataset's root folder and try again.
I tried the suggested 'fix' and removed the processed folder from CASP{i} where i in 9..13. Then I get the following error:
ValueError: With n_samples=0, test_size=None and train_size=0.85, the resulting train set will be empty. Adjust any of the aforementioned parameters.
This is probably because the script directly tries to read processed files rather than checking/generating them again. Hence, n_samples=0
Edit: Solved by writing a custom script to load graphs with older version (1.7.2) of torch.geometric.data.Data. Then I convert them to a dictionary using
to_dict()
method. Then, with the newer torch_geometric (2.0.1) version I load them back withfrom_dict()
method. This fixes the preprocessed data loading problem for recent versions of torch geometric.But unfortunately, even after the fixes mentioned above the results are nowhere near the reported results. After a few runs, best RMSE I got was 0.165 where it was reported as 0.130. Please note that I trained using the provided train.yaml file. Although the hardware is different, the differences between reported and reproduced results are very high. I would like to contribute to improving reproducibility but as of right now I hit a roadblock. I hope the authors can provide some clarification to this.
Another problem I've noticed is train.raw.yml is not runnable, it returns the following error:
AttributeError: 'NoneType' object has no attribute 'out_edge_feats'
The text was updated successfully, but these errors were encountered: