Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prepare_dataset issue #21

Open
julianaamorim opened this issue Jan 17, 2024 · 4 comments
Open

prepare_dataset issue #21

julianaamorim opened this issue Jan 17, 2024 · 4 comments

Comments

@julianaamorim
Copy link

Hello, I have just installed psearch and all the env dependencies in conda. I downloaded the acetylcholinestarase (AChE) dataset to do a test and in the phase of preparing the dataset I came across the following error:

**Traceback (most recent call last):
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)Traceback (most recent call last):
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/prepare_dataset.py", line 61, in common
create_db.main_params(dbout_fname=filenames[4],
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/scripts/create_db.py", line 156, in main_params
for i, res in enumerate(p.imap_unordered(map_process_mol,
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 451, in
return (item for chunk in result for item in chunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 873, in next
raise value
OSError: File error: Invalid input file /home/juliana/Downloads/Ache/compounds/inactive_conf.sdf
Process Process-1:
Traceback (most recent call last):
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/prepare_dataset.py", line 61, in common
create_db.main_params(dbout_fname=filenames[4],
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/scripts/create_db.py", line 156, in main_params
for i, res in enumerate(p.imap_unordered(map_process_mol,
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 451, in
return (item for chunk in result for item in chunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 873, in next
raise value
OSError: File error: Invalid input file /home/juliana/Downloads/Ache/compounds/active_conf.sdf
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/prepare_dataset.py", line 61, in common
create_db.main_params(dbout_fname=filenames[4],
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/scripts/create_db.py", line 156, in main_params
for i, res in enumerate(p.imap_unordered(map_process_mol,
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 451, in
return (item for chunk in result for item in chunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 873, in next
raise value
OSError: File error: Invalid input file /home/juliana/Downloads/Ache/compounds/inactive_conf.sdf
Process Process-1:
Traceback (most recent call last):
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, self._kwargs)
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/prepare_dataset.py", line 61, in common
create_db.main_params(dbout_fname=filenames[4],
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/scripts/create_db.py", line 156, in main_params
for i, res in enumerate(p.imap_unordered(map_process_mol,
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 451, in
return (item for chunk in result for item in chunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 873, in next
raise value
OSError: File error: Invalid input file /home/juliana/Downloads/Ache/compounds/active_conf.sdf

No file in the "compounds" folder was generated... Any suggestions on how to move forward?

Thanks in advance...

Juliana

@DrrDom
Copy link
Collaborator

DrrDom commented Jan 18, 2024

From which branch did you install psearch? You have to use gen_pharms branch, it is the most recent. Unfortunately we still did not fix all remaining bugs and merge it to the master.
Another aspect, I never used psearch with python 3.12, maximum it was 3.9. However, it should be a problem.

@julianaamorim
Copy link
Author

Running psearch on my current working dataset got my trainset list stuck... I tried some checking in select_training_set_rdkit.py to understand the problem, but to no avail...

>> psearch -p my_models_2/created_pharmacophores/ -i beta_2_short.smi -d dbs/beta.dat -c 4
Size of df before generating fingerprints: (101, 3)
Size of df after generating fingerprints: (101, 4)
Size of df_mols before concatenation: (101, 4)
Size of df_mols after concatenation: (101, 5)
100 molecules screened 00:00:01
external_statistics.txt: (0.009s)

Any light?

@DrrDom
Copy link
Collaborator

DrrDom commented Jan 28, 2024

Does it return any pharmacophore model? If not, it may happen that it cannot create training sets. I got something similar in the past and it would be reasonable to implement another modeling mode, where all input ligands will be taken as a training set without selection like now. It may also be some bug. If you can share your data set, I may look on it when I'll have time, but I do not promise that will do this quickly.

@julianaamorim
Copy link
Author

Increasing the number of active compounds solved the problem...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants