Skip to content

sarah-allec/materials_informatics

Repository files navigation

✨ Materials Informatics ✨

A collection of toy examples of various materials informatics techniques.

AboutBackgroundTechniquesUsage

portfolio

About

This repository contains both Jupyter notebooks for each materials informatics technique and the source code for my personal portfolio.

Background

Materials informatics is the application of data science and informatics techniques, including artificial intelligence and machine learning, to better understand and design materials. There are many aspects of materials science and engineering that make the application of these techniques unique and in many cases different than mainstream data science. Much of this uniqueness is due to the nature of materials-related datasets, which tend to be some combination of small, sparse, biased, clustered, and low quality.

Techniques

Transfer learning

Transfer learning is a machine learning (ML) approach where knowledge gained from a pre-trained model is used in the training of another model. Usually the pre-trained model is trained on a higher quality or more general dataset, and the target model is trained on a lower quality or more specific dataset. The goal is usually to speed up and/or improve learning on the target task. There are many transfer learning methods; two examples are: i) a latent variable (LV) approach, where the output of the pre-trained model is used as an input feature to the target model, and ii) a fine-tuning (FT) approach, where some optimized parameters of the pre-trained model are used to initialize the parameters of the target model.

I currently only have code up for the LV approach - see here for the corresponding Jupyter notebook and here for an overview on my portfolio page. All models were built with scikit-learn's RandomForestRegressor.

Active learning

Active learning is a machine learning (ML) approach where the ML model predictions and uncertainties are used to decide what data point to evaluate next in a search space. In the context of materials and chemicals, active learning is often used to guide design and optimization. To decide what material or experiment to run next, we utilize acquisition functions to either explore or exploit the search space. For example, if we seek to optimize a particular chemical property, we would utilize an exploitive acquisition function that prioritizes compounds predicted to have property values close to our target value. On the other hand, if we want to explore the search space and diversify our training data, we would utilize an explorative acquisition function that prioritizes compounds for which the model is most uncertain. For more information on acquisition functions, see here and here.

See here for the corresponding Jupyter notebook and here for an overview on my portfolio page. All models were built with GPflow's GPR.

Physics-informed learning

Physics-informed learning is an ML approach where physical knowledge is embedded into the learning processes, usually via a differential equation. Most often, this is achieved by incorporating an additional loss into the training of a neural network, where the loss is defined by the differential equation. Here, I have adapted this example on cooling to solve the time-dependency of Newton's law of cooling at various environmental temperatures $T_{env}$: $$\frac{dT(t)}{dt}=r(T_{env}-T(t)),$$ where $T(t)$ is the temperature at time $t$, $T_{env}$ is the environmental or surrounding temperature, and $r$ is the cooling rate. In this example, $r$ is a parameter that the network will learn from the data, $T_{env}$ is provided by the user, and $T(t)$ is the output of the network.

To incorporate a physics-based loss into the network, we simply move all terms to one side of the equation: $$g=\frac{dT(t)}{dt}-r(T_{env}-T(t)) = 0$$ Now we can take the physics-based loss to simply be the difference between the predicted $g$ and $0$. The derivative term, $\frac{dT(t)}{dt}$ is easily obtained via the gradients computed during backpropagation.

See here for the corresponding Jupyter notebook and here for an overview on my portfolio page. All models were built with pytorch.

Usage

Running the notebooks

To install the necessary packages for running the Jupyter notebooks, use pip to install the packages listed in requirements_notebooks.txt :

pip install -r requirements_notebooks.txt

Note that the notebooks often import data from data, as well as modules from python scripts located in notebooks.

Running the streamlit app

To install the necessary packages for running the streamlit app, use pip to install the packages listed in requirements_streamlit.txt :

pip install -r requirements_streamlit.txt

Note that the app connects to a Google cloud bucket to access data - the src code will need to be significantly modified in order to run the app locally, with or without Google cloud bucket.

About

Portfolio of materials informatics techniques

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published