Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best haploRILs parameters #1

Open
GoliczGenomeLab opened this issue Oct 23, 2024 · 1 comment
Open

Best haploRILs parameters #1

GoliczGenomeLab opened this issue Oct 23, 2024 · 1 comment

Comments

@GoliczGenomeLab
Copy link
Owner

GoliczGenomeLab commented Oct 23, 2024

          Hi @jamonterotena,

Thanks for the updated, that looks great.
I already tried out haploRILs this morning.
The functions works for my data. (but I have to modify the path for the haploRILs_function.R)

I would like to ask for your opinion about the parameter setting {nSnp} {step} {K}.
I have 2.7M SNP in total, but I guess it would be better to use the subset ~50K SNP. And compare the stability of the results.
Thus, what values would you recommend to use with ~50K SNP for 10 chromosomes of maize dataset?

P.S. I got a lot of warning messages when running the code, maybe you can check for it.
"summarise() has grouped output by 'id', 'nSnp', 'K', 'blocksFiltered'. You
can override using the .groups argument."

Best regards,
Yan-Cheng

Originally posted by @yan-cheng-lin in GoliczGenomeLab/haploMAGIC#2 (comment)

Thus, what values would you recommend to use with ~50K SNP for 10 chromosomes of maize dataset?

Hard to say. It depends on the resolution you aim to obtain, the marker size of your data and the genotyping error rates you expect in your data. I run a simulation-based benchmarking analysis on haploRILs that suggested that combinations of small window sizes, using low nSnp, with higher filtering controlled by K produces the best performance, especially with genotyping errors.

The functions works for my data. (but I have to modify the path for the haploRILs_function.R)

Thanks for reporting the bug!

P.S. I got a lot of warning messages when running the code, maybe you can check for it. "summarise() has grouped output by 'id', 'nSnp', 'K', 'blocksFiltered'. You can override using the .groups argument."

Thanks, I'm aware. dplyr::sumarise prints that annoying warning. It's possible to deactivate it but I found out that it's less risky to let it happen. I will fix it at some point.

@yan-cheng-lin
Copy link

Hi @GoliczGenomeLab ,

Thanks for the information "small window sizes, using low nSnp, with higher filtering controlled by K produces the best performance", I will try out with this principle.

Best regards,
YCL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants