This project aims to classify astronomical objects into three categories: Stars, Galaxies, and Quasars (QSOs) using data from the Sloan Digital Sky Survey (SDSS). The dataset contains 100,000 observations, each described by 17 features and a class label. Machine Learning (ML) and Deep Learning (DL) models are implemented to achieve accurate classification.
Stellar Classification Dataset - SDSS17
- Source: Kaggle
- Features: 17 features including magnitudes and colors, with a target class (Star, Galaxy, QSO).
- Size: 100,000 observations.
fedesoriano. (January 2022). Stellar Classification Dataset - SDSS17. Retrieved [Date Retrieved] from Kaggle.
The data released by the SDSS is under public domain. It is taken from the current data release RD17. More information about the license: SDSS License Information
- Feature Engineering: Added derived features like "G-I" (difference between
g
andi
magnitudes) and "Z-G" (difference betweenz
andg
magnitudes). - Feature Scaling: Standardized all feature values using
StandardScaler
. - Class Balancing: Used class weights to address imbalanced data.
- Multi-Layer Perceptron (MLP): Fully connected neural network with dropout layers for regularization.
- Convolutional Neural Network (CNN): 1D CNN with convolution and max-pooling layers, designed for sequential data.
- Long Short-Term Memory (LSTM): LSTM network to capture sequential dependencies in features.
- Train-Test Split: 80% training, 20% testing.
- Validation Split: 5% of the training data used for validation.
- Metrics: Accuracy, confusion matrix.
- Optimization: Categorical cross-entropy loss with the Adam optimizer.
Each model’s performance was evaluated, and the best-performing model was selected based on accuracy. Confusion matrices were used to analyze the classification results further.
- Clone the repository.
- Install required Python libraries:
scikit-learn
,tensorflow
,numpy
,matplotlib
. - Load the dataset and follow the preprocessing steps in the code.
- Train models and evaluate results.
- Use the best-performing model for predictions on new data.
This project uses publicly available SDSS data, which is under the public domain. For more details, refer to the SDSS License Information.