-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding models results #17
Comments
You are right, f7 is expected to designate that there are 7 features in a pharmacophore. This seems like a bug, but after quick investigation I could not figure out the source of the error. I'll label this issue as a bug to fix in a future. This should not affect output models. Unique features are features with distinct coordinates. In your case I expect that aromatic and hydrophobic features have the same coordinates, therefore each pair is counted as a single feature. Two acceptors have different coordinates. So, overall there are two acceptors and two pairs of a and H features with different coordinates, that means 4 unique features. The name could be confusing. The reason for that to better discriminate spatial complexity of pharmacophore models. To see all features in pymol you may force to show them as spheres. Alternatively you may use a pymol script - #15 |
Hi again, I would like to understand what the criteria are for selecting the best model since there is no alignment. Recall> precision> FPR, etc ? Thanks...... |
if you ask about selection of the final model to be used for virtual screening, this is completely on your choice as in any other cases, alignment will not help with that. You may choose a model with the highest precision value to retrieve actives with higher probability (conservative strategy), or you may choose a models with larger recall to increase chances to retrieve diverse hits. If you ask about how models internally selected on each iteration, there is a function if clust_strategy == 2:
df = df.sort_values(by=['recall', 'F2', 'F05'], ascending=False).reset_index(drop=True)
if df['F2'].iloc[0] == 1.0:
df = df[(df['recall'] == 1.0) & (df['F2'] == 1.0)]
elif df[df['F2'] >= 0.8].shape[0] <= 100:
df = df[(df['recall'] == 1) & (df['F2'] >= 0.8)]
else:
df = df[(df['recall'] == 1) & (df['F2'] >= df['F2'].loc[100])]
elif clust_strategy == 1:
df = df.sort_values(by=['recall', 'F05', 'F2'], ascending=False).reset_index(drop=True)
df = df[df['F05'] >= 0.8] if df[df['F05'] >= 0.8].shape[0] <= 100 else df[df['F05'] >= df['F05'].loc[100]]
Yes, it is more limited, because if a and H features have the same coordinates such a model can match only aromatic groups. H feature alone matches also saturated carbocycles and alkyl groups. Hope this will help. |
Hi,
I have a question related to results of model building.
In the statistics file, I can see a line like this
cdk8 t1_f7_p0 36 11 94 131 0.766 0.383 0.084 0.511 0.426 0.638 0.65 1.833 4 8.883 aaAAHH
I understand that f7 refers to 7 features but when I can see only six features aaAAHH ( 2 aromatics, 2 acceptors and 2 hydrophobic )... so where is the 7th feature.
also it is written 4 unique features, should it be 3 ? A,a and H ?
the last thing, when I download the xyz file, how can I view this and relate it to the above features because when I open it in pymol I only see 3 spheres ...
The text was updated successfully, but these errors were encountered: