Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2021 Matminer/ML Update (WIP) #149

Merged
merged 9 commits into from
Jul 28, 2021
Merged

Conversation

ardunn
Copy link
Contributor

@ardunn ardunn commented Jul 21, 2021

Issues closed

Changes

  • Updated matminer in requirements to newest version (breaking previous version)
  • Updated imports in lesson and exercise code cells which were broken caused by matminer breaking changes in 0.7.x
  • Fixed various typos and nasty-looking code outputs to more readable versions
  • Created figrecipes pypi package (as it is no longer in matminer proper but the teaching notebook relies on it), added it to requirements - hopefully this does not cause depdendency hell, I tried to avoid that by taking @shyamd == vs >= advice from here ; updated nb code to reflect new figrecipes imports; also, made the graphs look better
  • Replace Figrecipes with plotly express

Need feedback before proceeding

  • Remove automatminer code section: reason being amm has not been updated in a while and does not work with newest matminer, will almost certainly cause dependency problems unless update; I likely will not be able to update amm before MPWorkshop starts. Maybe we can replace current automatminer code section with just a brief infographic overview of what is does? Need @shyamd opinion
  • Add infographic section on matbench, including short interactive part for exploring the datasets on MPContribs; we can put the matbench docs under an MP domain if desired. Need @mkhorton @shyamd guidance on what to do here. Can add code section on how to use the matbench python package but am not sure if this is desired or not.

@shyamd
Copy link
Contributor

shyamd commented Jul 21, 2021

Agreed on removing automatminer. The lesson was already long and this gives you more breathing room on the rest to make sure people understand and provide exercises that enforce the learning process.

Can you clarify what a matbench lesson would cover? It's not clear to me how it fits in.

@ardunn
Copy link
Contributor Author

ardunn commented Jul 21, 2021

I think that is up to @mkhorton as I am not 100% sure if he was hoping to show the website, perhaps under some mp domain e.g., matbench.materialsproject.org or similar. I am 100% ok either with including it or not, just depends on what you guys want.

If we were to include it, I was thinking to show at the end once we got done examining the model in the notebook; something like

"if you are looking to benchmark models in a reproducible way or compare model performance across a variety of materials science tasks, the MP Matbench project has ML tasks/datasets specifically for this purpose. We also have an interactive website where you can compare model results and submit your own models on the tasks to appear on the leaderboard."

@shyamd
Copy link
Contributor

shyamd commented Jul 21, 2021

No need to have it under the "MP" URL umbrella. Can you make it interactive?

@ardunn
Copy link
Contributor Author

ardunn commented Jul 21, 2021

@shyamd I could, do you mean interactive in terms of having students run code themselves? Or just interacting with the website?

If it is the former, it is possible but it might be another instance of too much information. In addition to the website, using Matbench to record new results requires yet another python library with its own objects etc. Would require like another 5 code cells to just get the fundamentals of how to use it.

What is your thought on just having a quick postscript (like, three sentences or less of text) with an image and a quick guided tour of the matbench website instead? I.e., "this matbench thing is available and useful for X, Y, Z, here's the website and if you're interested there's comprehensive docs on how to use it"

@shyamd
Copy link
Contributor

shyamd commented Jul 21, 2021

I don't think it's too useful. The best way to learn is by doing. Is there not a convenient function in matminer to get matbench data sets?

@ardunn
Copy link
Contributor Author

ardunn commented Jul 21, 2021

Yeah, you can get them with the load_dataset function which we use throughout the lesson already. E.g.,

from matminer.datasets.dataset_retrieval import load_dataset

df = load_dataset("matbench_mp_gap")

Maybe I can show that?

@shyamd
Copy link
Contributor

shyamd commented Jul 21, 2021

Yup, that sounds good. Maybe even make it an exercise where you show them how to load data and have them recreate one of the benchmarks? They will have access to some computing on CoCalc, but it won't be a ton, so nothing too hard like re-training CGNN.

@ardunn
Copy link
Contributor Author

ardunn commented Jul 21, 2021

Having them recreate one of the whole 13-task benchmarks, even with like a super basic linear regression, is for sure going to nuke whatever compute CoCalc is allotting MP - just deserializing some of the larger datasets from disk takes a while.

Maybe we could just have them do a single, smaller task? Like the steels prediction, which is just 312 compositions. Then again, that requires explaining nested cross validation and such. Regardless, I could include a short section on just looking at the benchmark data and explaining how to use it.

I can plan on adding some short, interactive, matbench-flavored section into this PR this weekend. Will that give you enough time to review before the workshop?

@shyamd
Copy link
Contributor

shyamd commented Jul 21, 2021

Oh yeah, that's good timing. I'm gonna setup some slots over the next two weeks for people to practice, which is the key deadline.

@mkhorton
Copy link
Member

I think that is up to @mkhorton as I am not 100% sure if he was hoping to show the website, perhaps under some mp domain e.g., matbench.materialsproject.org or similar. I am 100% ok either with including it or not, just depends on what you guys want.

If you want to include, I can definitely move this to an MP domain before the workshop. We can chat during our meeting next week. Will leave the decision on whether to include in the workshop lesson to you.

@ardunn
Copy link
Contributor Author

ardunn commented Jul 28, 2021

@shyamd just added a short primer on matbench, it's currently in a kind of "bonus" section which can be left out if we're running short on time. Lmk what you think!

# Conflicts:
#	requirements.txt
#	workshop/lessons/08_ml_matminer/matminer-notes.ipynb
@shyamd
Copy link
Contributor

shyamd commented Jul 28, 2021

Can you enable it so that we can push to your branch. It's usually an option called "Allow edits from maintainers" that you have to check in this PR.

Edit: NVM my fault.

@shyamd shyamd merged commit 8671072 into materialsproject:master Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2021 Matminer/Machine Learning Comments
3 participants