This project investigates the use of machine learning (ML) techniques to model and predict interatomic potentials in silicon-metal alloy systems. It combines Density Functional Theory (DFT) calculations with ML to achieve accurate and computationally efficient potential predictions. Three ML models—Support Vector Regression (SVR), Gaussian Mixture Models (GMM), and Fully Connected Neural Networks (FCNN)—are trained on the generated datasets.
Install the required Python libraries using the requirements.txt
file:
pip install -r requirements.txt
Run data_collection.ipynb
to:
- Collect material properties using libraries like
pymatgen
andmatminer
. - Perform DFT calculations to generate potential energy surface data.
- Process and structure data into CSV files for training.
- Material Properties: Includes energy per atom, formation energy per atom, band gap, etc.
- Categorical Data: Material classifications and labels.
- Featurized Data: Includes density features, XRD powder patterns, orbital field matrices, DFT-based generated data.
Train three ML models on the generated datasets:
nn.py
: Trains a Fully Connected Neural Network (FCNN) for predicting energy-related properties.svr.py
: Trains a Support Vector Regression (SVR) model to predict formation and potential energies.gmm.py
: Fits a Gaussian Mixture Model (GMM) to probabilistically model energy distributions.
- Outputs model performance metrics (e.g., RMSE).
Run plot.ipynb
to visualize:
- Actual vs. Predicted Potentials for each ML model.
- RMSE performance across models and datasets.
- Comparisons of training and testing RMSE for each technique.
The notebook reproduces the figures shown in the project report.
The models were evaluated using Root Mean Square Error (RMSE):
- Support Vector Regression (SVR): Achieved the lowest RMSE, showing strong predictive performance and good generalization.
- Gaussian Mixture Models (GMM): Moderate RMSE values but struggled with generalization.
- Neural Networks (NN): Highest RMSE, indicating overfitting and poor generalization.
- XRD Dataset: Best performance for all models, particularly SVR.
- Orbital and Sine Datasets: SVR still outperformed other models, but with slightly higher RMSE.
- DFT Dataset: Most challenging for all models, with NN showing the poorest performance.
Dataset | Model | Train RMSE | Test RMSE |
---|---|---|---|
XRD | SVR | 0.095 | 0.087 |
XRD | GMM | 0.476 | 1.323 |
XRD | NN | 0.731 | 1.874 |
This project was developed as part of the ME438 course at IIT Bombay, under the guidance of Prof. Amit Singh.