Skip to content

Implementation of a logistic regression model with the gradient descent algorithm

Notifications You must be signed in to change notification settings

oph-design/dslr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science & Logictic Regression

GitHub code size in bytes Code language count GitHub top language GitHub last commit

Table of Contents
  1. About The Project
  2. Getting Started
  3. Examples
  4. Contact

About The Project

Screen Shot 2024-05-27 at 3 32 46 AM

DSLR is a collection of python programs visualising and analysing the Hogwarts Dataset.
The Dataset is consisting of multiple Hogwarts Students as Data points containing their house and grades.
The Goal was to analyze the data and create a logistic regression model for telling the house of any student.

The Programs are ...
describe.py: replication of the pandas describe function
histogram.py: plotting the histogram for every subject
scatter_plot.py: plotting the comparison between each subject
pair_plot.py: plotting a pair plot for the entire Dataset
logreg_train.py: training the regression model
logreg_predict.py: make a prediction using the created model

For the training the classic gradient descent method is used. The Model focuses just on a few chosen features based on the previous analysis, namely Ancient Runes, Defense Against the Dark Arts and Herbology. The features were chosen because there distribution is split in 2 with each having 2 houses with good and 2 house with bad grades, which creates a matrix able to tell a students house with almost 100% accuracy. The training is design to find this pattern and has a 99% accuracy on the test Dataset.

Getting Started

The following contains a description of how to use the program.

Prerequisites

To run the programs you have to have python3 and pip3 installed. See an installation guide here After, you have to clone the repository and install the libraries used in this project.

 git clone https://github.com/oph-design/ft_linear_regression
 pip3 install -r requirements.txt

⚠️The recommended Python version is 3.11.5

Usage

You can start each program from the below list by typing python3 and the path to the program. It is recommended to start each program from the root of the repository to avoid pathing issue s. To make it more comfortable the program controller.py was added to provide a cli for starting the program. controller.py takes as first input the program you want to use and asks for further input, using a default value if you keep the prompt blank.

pyhton3 controller.py

Following is a list of the programs and their arguments:

Program ARG1 ARG2 ARG3
describe.py Path to Dataset - -
histogram.py Path to Dataset Subject to show (shows all if empty) -
scatter_plot.py Path to Dataset Subject to show Subject to compare too
pair_plot.py Path to Dataset - -
logreg_train.py Path to Dataset Optimisation algorithm (GD, stochastic GD, mini-batch GD) -
logreg_predict.py Path to Dataset - -

Examples

Histogram

python3 visuals/histogram.py datasets/dataset_train.csv
Screen Shot 2024-05-27 at 4 17 01 AM

Scatterplot 1 Argument

python3 visuals/scatter_plot.py datasets/dataset_train.csv Herbology
Screen Shot 2024-05-27 at 4 17 46 AM

Scatterplot 2 Arguments

python3 visuals/scatter_plot.py datasets/dataset_train.csv Flying "Ancient Runes"
Screen Shot 2024-05-27 at 4 18 28 AM

Pairplot

python3 visuals/pair_plot.py datasets/dataset_train.csv
Screen Shot 2024-05-27 at 4 22 42 AM

Contact

Ole-Paul Heinzelmann
[email protected]

linkedin shield

(back to top)

About

Implementation of a logistic regression model with the gradient descent algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages