Table of Contents
DSLR is a collection of python programs visualising and analysing the Hogwarts Dataset.
The Dataset is consisting of multiple Hogwarts Students as Data points containing their house and grades.
The Goal was to analyze the data and create a logistic regression model for telling the house of any student.
The Programs are ...
describe.py
: replication of the pandas describe function
histogram.py
: plotting the histogram for every subject
scatter_plot.py
: plotting the comparison between each subject
pair_plot.py
: plotting a pair plot for the entire Dataset
logreg_train.py
: training the regression model
logreg_predict.py
: make a prediction using the created model
For the training the classic gradient descent method is used. The Model focuses just on a few chosen features based on the previous analysis, namely
Ancient Runes
, Defense Against the Dark Arts
and Herbology
. The features were chosen because there distribution is split in 2 with each having 2 houses with good
and 2 house with bad grades, which creates a matrix able to tell a students house with almost 100% accuracy. The training is design to find this pattern and has a 99% accuracy
on the test Dataset.
The following contains a description of how to use the program.
To run the programs you have to have python3 and pip3 installed. See an installation guide here After, you have to clone the repository and install the libraries used in this project.
git clone https://github.com/oph-design/ft_linear_regression
pip3 install -r requirements.txt
You can start each program from the below list by typing python3
and the path to the program.
It is recommended to start each program from the root of the repository to avoid pathing issue s.
To make it more comfortable the program controller.py
was added to provide a cli for starting the program.
controller.py
takes as first input the program you want to use and asks for further input, using a default value if you keep the prompt blank.
pyhton3 controller.py
Following is a list of the programs and their arguments:
Program | ARG1 | ARG2 | ARG3 |
---|---|---|---|
describe.py |
Path to Dataset | - | - |
histogram.py |
Path to Dataset | Subject to show (shows all if empty) | - |
scatter_plot.py |
Path to Dataset | Subject to show | Subject to compare too |
pair_plot.py |
Path to Dataset | - | - |
logreg_train.py |
Path to Dataset | Optimisation algorithm (GD, stochastic GD, mini-batch GD) | - |
logreg_predict.py |
Path to Dataset | - | - |
python3 visuals/histogram.py datasets/dataset_train.csv
python3 visuals/scatter_plot.py datasets/dataset_train.csv Herbology
python3 visuals/scatter_plot.py datasets/dataset_train.csv Flying "Ancient Runes"
python3 visuals/pair_plot.py datasets/dataset_train.csv
Ole-Paul Heinzelmann
[email protected]