Skip to content

Releases: ryannduma/intermetallics_mod

Introducing ChemBoost

27 Dec 01:42
93c808e
Compare
Choose a tag to compare
Introducing ChemBoost Pre-release
Pre-release

ChemBoost: SMACT-Based Intermetallics Classification Package

Overview

This release introduces ChemBoost, a machine learning package designed to enhance SMACT's filtering capabilities by classifying intermetallic compounds as metals or non-metals. The package leverages XGBoost and materials science features to provide accurate predictions based on compositional properties. It is still a work in progress, and will evolve as time goes by, feel free to reach out with ideas and potential solutions to any challenges encountered.

Key Features

  • Advanced Feature Engineering:

    • Valence Electron Count (VEC) calculations
    • Electronegativity difference metrics
    • Atomic concentration analysis
    • Magpie feature integration
  • Robust Model Training:

    • XGBoost classifier with GridSearchCV optimization
    • Hyperparameter tuning
    • Cross-validation support
  • Comprehensive Visualization:

    • Confusion matrix plotting
    • SHAP value analysis for model interpretation
    • Feature importance visualization
    • Distribution analysis of key features

Package Structure

chemboost-classification/
├── src/ # Core package code
│ ├── data/ # Data loading utilities
│ ├── features/ # Feature engineering
│ ├── models/ # ML model implementation
│ └── visualisation/ # Plotting utilities
├── notebooks/ # Analysis notebooks
└── data/ # Dataset storage

Dependencies

  • Python 3.11+
  • XGBoost 2.1.3
  • scikit-learn 1.5.2
  • matminer
  • pymatgen
  • SMACT 2.8+

Usage

The package can be installed and run using:

pip install -e .
python run.py

Technical Details

  • Implements a modular, maintainable codebase
  • Includes comprehensive documentation
  • Provides both command-line and API interfaces
  • Features extensive error handling and input validation

Testing

  • Tested with Matbench experimental bandgap dataset
  • Validated against known metallic/non-metallic compounds
  • Cross-validated model performance metrics included

Future Enhancements

  • Developing more robust feature handling to help improve the filter's capabilities and further separate metals and non-metals accurately
  • Integration with SMACT's main filtering pipeline
  • Support for more complex chemical environments
  • Extended visualization options

Notes

  • Requires scikit-learn <1.6 due to API changes that are currently incompatible with xgboost
  • Includes sample data for immediate testing
  • Documentation available in README and docstrings