Releases: ryannduma/intermetallics_mod
Introducing ChemBoost
ChemBoost: SMACT-Based Intermetallics Classification Package
Overview
This release introduces ChemBoost, a machine learning package designed to enhance SMACT's filtering capabilities by classifying intermetallic compounds as metals or non-metals. The package leverages XGBoost and materials science features to provide accurate predictions based on compositional properties. It is still a work in progress, and will evolve as time goes by, feel free to reach out with ideas and potential solutions to any challenges encountered.
Key Features
-
Advanced Feature Engineering:
- Valence Electron Count (VEC) calculations
- Electronegativity difference metrics
- Atomic concentration analysis
- Magpie feature integration
-
Robust Model Training:
- XGBoost classifier with GridSearchCV optimization
- Hyperparameter tuning
- Cross-validation support
-
Comprehensive Visualization:
- Confusion matrix plotting
- SHAP value analysis for model interpretation
- Feature importance visualization
- Distribution analysis of key features
Package Structure
chemboost-classification/
├── src/ # Core package code
│ ├── data/ # Data loading utilities
│ ├── features/ # Feature engineering
│ ├── models/ # ML model implementation
│ └── visualisation/ # Plotting utilities
├── notebooks/ # Analysis notebooks
└── data/ # Dataset storage
Dependencies
- Python 3.11+
- XGBoost 2.1.3
- scikit-learn 1.5.2
- matminer
- pymatgen
- SMACT 2.8+
Usage
The package can be installed and run using:
pip install -e .
python run.py
Technical Details
- Implements a modular, maintainable codebase
- Includes comprehensive documentation
- Provides both command-line and API interfaces
- Features extensive error handling and input validation
Testing
- Tested with Matbench experimental bandgap dataset
- Validated against known metallic/non-metallic compounds
- Cross-validated model performance metrics included
Future Enhancements
- Developing more robust feature handling to help improve the filter's capabilities and further separate metals and non-metals accurately
- Integration with SMACT's main filtering pipeline
- Support for more complex chemical environments
- Extended visualization options
Notes
- Requires scikit-learn <1.6 due to API changes that are currently incompatible with xgboost
- Includes sample data for immediate testing
- Documentation available in README and docstrings