Oil Well Cluster Predictor

Overview

This project aims to predict the cluster affiliation of oil wells based on time series production metrics using machine learning and deep learning techniques. The dataset consists of two main files: timeseries_data.csv containing the time series production metrics of various wells, and well_data.csv containing the cluster grouping of each well.

Project Structure

dataset/:
- raw/: Contains the raw input data files, including timeseries_data.csv and well_data.csv.
- interm/: Holds intermediate data files, including the indices of the train and test samples used in the project.
models/: Directory containing the input data files.
notebooks/: Jupyter notebooks for project demo, exploratory data analysis (EDA), and error analysis.
scripts/: Scripts for various project tasks, such as train/test split, model training, evaluation, and prediction.
src/: Source code files are organized in this directory, including modules for data preprocessing, feature engineering, model training, and evaluation.

Setup Instructions

Download and Unzip:

Download the project zip file
Extract the contents to your local machine

unzip well-cluster-prediction.zip
cd well-cluster-prediction

Install Dependencies

Install the required Python dependencies:

    pip install -r requirements.txt

This section outlines the process of training and evaluating a machine learning model for the oil well cluster predictor project.

Training Data Preparation

Firstly, the training and test datasets are prepared using the train_test_split.py script with the configuration file config.json. After execution, the datasets are saved to the following locations:

Train dataset: ./dataset/interm/train.csv
Test dataset: ./dataset/interm/test.csv

!python scripts/train_test_split.py config.json

Model Training

Next, the model training is performed using the run_experiment.py script with the same configuration file. The best model is identified along with its parameters and is saved for future use. Here are the details of the best model obtained:

Preprocessing steps: StandardScaler
Classifier: RandomForestClassifier with balanced class weights and a maximum depth of 10

!python scripts/run_experiment.py config.json

Best parameters:

Pipeline(steps=[('preprocessor', StandardScaler()),
                ('clf',
                 RandomForestClassifier(class_weight='balanced',
                                        max_depth=10))])

Best score: 0.2758683098711358

Best model saved at: ./models/model_20240229152028.pkl

Model Evaluation

The trained model is evaluated using the evaluation.py script with the configuration file config.json. The classification report and confusion matrix are generated to assess the model's performance.

!python scripts/evaluation.py config.json

Classification Report

              precision    recall  f1-score   support
           0       0.44      0.59      0.51        32
           1       0.30      0.19      0.23        16
           2       0.33      0.37      0.35        30
           3       0.67      0.18      0.29        11

    accuracy                           0.39        89
   macro avg       0.44      0.33      0.34        89
weighted avg       0.41      0.39      0.38        89

Confusion Matrix

[[19  1 12  0]
 [ 6  3  6  1]
 [15  4 11  0]
 [ 3  2  4  2]]

Prediction

Finally, predictions are made using the trained model on new data (new_predict_data.csv) using the predict.py script with the configuration file config.json. The predictions for each well are provided in the output dictionary.

!python scripts/predict.py config.json new_predict_data.csv

Prediction results:

{'well_14': 'constant', 'well_9': 'multi'}

Tasks

Bonus:

Develop a gui and host it on github.io

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Oil Well Cluster Predictor

Overview

Project Structure

Setup Instructions

Training Data Preparation

Model Training

Model Evaluation

Classification Report

Confusion Matrix

Prediction

Tasks

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
dataset		dataset
models		models
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
config.json		config.json
new_predict_data.csv		new_predict_data.csv
readme.md		readme.md
requirements.txt		requirements.txt

x110/oil-well-cluster-predictor

Folders and files

Latest commit

History

Repository files navigation

Oil Well Cluster Predictor

Overview

Project Structure

Setup Instructions

Training Data Preparation

Model Training

Model Evaluation

Classification Report

Confusion Matrix

Prediction

Tasks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages