Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multidms fitting bug in func_effect_shifts.ipynb #92

Closed
jgallowa07 opened this issue Jan 9, 2024 · 3 comments · Fixed by #93
Closed

multidms fitting bug in func_effect_shifts.ipynb #92

jgallowa07 opened this issue Jan 9, 2024 · 3 comments · Fixed by #93

Comments

@jgallowa07
Copy link
Collaborator

While working on #86 , we encountered a bug in the lasso sweep code in the notebook

lasso_shifts = params["lasso_shifts"]
assert len(lasso_shifts) == len(set(lasso_shifts))
n_lasso = len(params["lasso_shifts"])
fig, ax = plt.subplots(n_lasso, 2, figsize=[6, n_lasso * 3])

model = multidms.Model(data)
mutations_df = []
for i, lasso_shift in enumerate(lasso_shifts):
    lasso_shift = float(lasso_shift)
    print(f"Fitting model for {lasso_shift=}")
    model.fit(lasso_shift=lasso_shift)
    mutations_df.append(
        model.get_mutations_df(phenotype_as_effect=True).assign(lasso_shift=lasso_shift)
    )
    model.plot_epistasis(ax=ax[i, 1], alpha=0.1, show=False, legend=not i)
    model.plot_pred_accuracy(ax=ax[i, 0], alpha=0.1, show=False, legend=False)
    ax[i, 1].set_title(f"Epistasis fit (lasso {lasso_shift})")
    ax[i, 0].set_title(f"Accuracy (lasso {lasso_shift})")

plt.show()

Here, the model object is only initialized once, meaning there is only a single set of parameters being re-fit for each of the lasso strengths in the loop. To fix this, we should simply need to move the model instantiation inside the loop.

@jbloom jbloom linked a pull request Jan 10, 2024 that will close this issue
@jbloom
Copy link
Contributor

jbloom commented Jan 10, 2024

I merged the pull request that fixes this. But I would note, although I agree this new code makes sense, I'm not sure the prior code was truly a bug, was it? Starting each lasso from the results of the prior weight is also an arguably valid approach even if we think fitting each one independently is better, right? Or is there a reason why it was strictly speaking a bug before?

@Caleb-Carr
Copy link
Contributor

Caleb-Carr commented Jan 10, 2024

@jbloom What I noticed is that the results varied pretty substantially depending on what values were used for the lasso sweep even though the 'final' analysis used the same lasso weight. I attached an image that shows how different lasso sweeps affect the final analysis even though each one uses the same 'final' lasso weight. I think the main downside is that this could be pretty confusing if you expect similar results for a particular lasso weight but they differ a lot depending on the other weights tested?
image (4)

@jbloom
Copy link
Contributor

jbloom commented Jan 10, 2024

Anyway, should have behavior you want now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants