-
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 14 replies
-
We ultimately don't care about the typical strict architecture comparison found in other ML benchmarks. We care about measuring how good ML (any form of ML) is at OOD materials stability prediction. If some models (interatomic potentials) are trained on forces and therefore can leverage more of the maximum training set released with our benchmark (the entirety of the MP v2022.10.28 database version) then that's a genuine advantage of force-full models for the real-world application we care about and we want our benchmark to reflect that. In short, we want to provide a walled garden for asking system-level questions which a traditional ML benchmark is too rigid to answer. I believe we succeeded at that. Matbench Discovery clearly demonstrated that universal interatomic potentials emulating DFT relaxation are the winning methodology for high-throughput OOD materials stability prediction. |
Beta Was this translation helpful? Give feedback.
-
Also, training set size and overfitting are two different concepts which you seem to be conflating. What's more, the empirical evidence suggests overfitting is a non-issue with large models. |
Beta Was this translation helpful? Give feedback.
-
@janosh As gnome is there with an unknown large dataset, do you think it would make sense to add one of our CGAT models based on Alexandria (we would have to remove the wbm data first). However we usually only train our models for EHull prediction as that is the only target required for materials discovery. Of course one issue with that is that our convex hull is much more complete which will increase some errors. |
Beta Was this translation helpful? Give feedback.
We ultimately don't care about the typical strict architecture comparison found in other ML benchmarks. We care about measuring how good ML (any form of ML) is at OOD materials stability prediction. If some models (interatomic potentials) are trained on forces and therefore can leverage more of the maximum training set released with our benchmark (the entirety of the MP v2022.10.28 database version) then that's a genuine advantage of force-full models for the real-world application we care about and we want our benchmark to reflect that.
In short, we want to provide a walled garden for asking system-level questions which a traditional ML benchmark is too rigid to answer. I believe we succ…