Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding models results #17

Open
sherifelsabbagh opened this issue Dec 7, 2023 · 3 comments
Open

Understanding models results #17

sherifelsabbagh opened this issue Dec 7, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@sherifelsabbagh
Copy link

Hi,

I have a question related to results of model building.

In the statistics file, I can see a line like this

cdk8 t1_f7_p0 36 11 94 131 0.766 0.383 0.084 0.511 0.426 0.638 0.65 1.833 4 8.883 aaAAHH

I understand that f7 refers to 7 features but when I can see only six features aaAAHH ( 2 aromatics, 2 acceptors and 2 hydrophobic )... so where is the 7th feature.

also it is written 4 unique features, should it be 3 ? A,a and H ?

the last thing, when I download the xyz file, how can I view this and relate it to the above features because when I open it in pymol I only see 3 spheres ...

@DrrDom
Copy link
Collaborator

DrrDom commented Dec 9, 2023

You are right, f7 is expected to designate that there are 7 features in a pharmacophore. This seems like a bug, but after quick investigation I could not figure out the source of the error. I'll label this issue as a bug to fix in a future. This should not affect output models.

Unique features are features with distinct coordinates. In your case I expect that aromatic and hydrophobic features have the same coordinates, therefore each pair is counted as a single feature. Two acceptors have different coordinates. So, overall there are two acceptors and two pairs of a and H features with different coordinates, that means 4 unique features. The name could be confusing. The reason for that to better discriminate spatial complexity of pharmacophore models.

To see all features in pymol you may force to show them as spheres. Alternatively you may use a pymol script - #15

@DrrDom DrrDom added the bug Something isn't working label Dec 9, 2023
@julianaamorim
Copy link

Hi again,

I would like to understand what the criteria are for selecting the best model since there is no alignment. Recall> precision> FPR, etc ?
Isn't the screening of a database more limited with the same coordinate for different features (a and H) in a same model? Or not...

Thanks......

@DrrDom
Copy link
Collaborator

DrrDom commented Jan 22, 2024

I would like to understand what the criteria are for selecting the best model since there is no alignment. Recall> precision> FPR, etc ?

if you ask about selection of the final model to be used for virtual screening, this is completely on your choice as in any other cases, alignment will not help with that. You may choose a model with the highest precision value to retrieve actives with higher probability (conservative strategy), or you may choose a models with larger recall to increase chances to retrieve diverse hits.

If you ask about how models internally selected on each iteration, there is a function strategy_extract_trainset in gen_pharm_models.py. It is also described in the paper. There are different criteria for different modeling strategies.

if clust_strategy == 2:
    df = df.sort_values(by=['recall', 'F2', 'F05'], ascending=False).reset_index(drop=True)
    if df['F2'].iloc[0] == 1.0:
        df = df[(df['recall'] == 1.0) & (df['F2'] == 1.0)]
    elif df[df['F2'] >= 0.8].shape[0] <= 100:
        df = df[(df['recall'] == 1) & (df['F2'] >= 0.8)]
    else:
        df = df[(df['recall'] == 1) & (df['F2'] >= df['F2'].loc[100])]
elif clust_strategy == 1:
    df = df.sort_values(by=['recall', 'F05', 'F2'], ascending=False).reset_index(drop=True)
    df = df[df['F05'] >= 0.8] if df[df['F05'] >= 0.8].shape[0] <= 100 else df[df['F05'] >= df['F05'].loc[100]]

Isn't the screening of a database more limited with the same coordinate for different features (a and H) in a same model? Or not

Yes, it is more limited, because if a and H features have the same coordinates such a model can match only aromatic groups. H feature alone matches also saturated carbocycles and alkyl groups.

Hope this will help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants