This project develops machine learning models to predict PFAS (Per- and Polyfluoroalkyl Substances) contamination levels in U.S. water systems for 2024 using historical data from 2001 to 2023. The project employs time series analysis and machine learning techniques to capture patterns in PFAS contamination.
- ✅ Initial data loading and preprocessing implemented
- ✅ Basic feature engineering completed
- ✅ Preliminary model development with XGBoost
- ✅ Basic prediction functionality for key PFAS compounds
- ✅ Initial MAE metrics for model evaluation
├── data/
│ └── UCMR_data.csv
├── notebooks/
│ └── PFAS_Analysis.ipynb
├── src/
│ ├── data_preprocessing.py
│ ├── feature_engineering.py
│ └── model.py
├── requirements.txt
└── README.md
Current model predictions for key PFAS compounds:
| | PFOA | 0.004000 | 0.001669 | ~0.003366 | | PFOS | 0.004000 | 0.001668 | ~0.003184 | | PFHxS | 0.003000 | 0.001699 | ~0.002439 | | PFNA | 0.004000 | 0.001582 | ~0.003103 | | PFBS | 0.003000 | 0.004998 | ~0.002850 |
- Python 3.8+
- pandas
- numpy
- scikit-learn
- XGBoost
- matplotlib/seaborn
-
Data Analysis
- Implement comprehensive EDA
- Add regional analysis
- Create data visualizations
-
Feature Engineering
- Add seasonal components
- Include geospatial features
- Create advanced lag variables
-
Model Improvements
- Implement additional models (LSTM, Random Forest)
- Add cross-validation
- Enhance prediction intervals
-
Analysis
- Regional performance analysis
- Feature importance analysis
- Error analysis by region
-
Visualization
- Interactive dashboard
- Regional map visualizations
- Trend analysis plots
-
Data Usage
- Historical data from 2001-2023
- UCMR 1-4 Data
- UCMR 5 Data (2023)
- EPA Envirofacts
-
Model Requirements
- Time series analysis
- Forecasting for 2024
- Regional analysis
-
Evaluation Metrics
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Regional performance analysis
# Clone the repository
git clone [your-repo-url]
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install requirements
pip install -r requirements.txt
# Example code for basic usage
from src.model import PFASPredictor
# Initialize predictor
predictor = PFASPredictor()
# Load and prepare data
X, y = predictor.prepare_features(df, 'PFOA')
# Train model
model, mae = predictor.train_model(X, y)
# Generate predictions
predictions = predictor.predict_2024(model, mrl)
- Predictions show limited seasonal variation
- Regional analysis not yet implemented
- Need to incorporate more sophisticated feature engineering
- Error analysis needs expansion
- Visualization components pending
- Implement regional analysis functionality
- Add seasonal components to predictions
- Create comprehensive visualization dashboard
- Expand error analysis
- Add LSTM model implementation
- Anudeep Adiraju
- Course: AI Practicum
- University: The University of Texas at San Antonio
Apache 2.0
- EPA for UCMR data
- UTSA AI Practicum course staff
Last Updated: November 13, 2024