Skip to content

In this repo, I implement a decision tree classifier from scratch using python and pandas. I use this decision tree to detect gender based on names. I use the first letter, first two letters, first three letters, last letter, last two letters, last three letters, etc. as features.

Notifications You must be signed in to change notification settings

mshadloo/Gender-Detection-with-Decision-Tree-from-scrach

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gender Detection with Decision Tree from scrach

In this repo, I implement a decision tree classifier from scratch. Decision trees are built using a recursive algorithm known as divide and conquer algorithm.

  • I select the best feature based on information gain for root node. I create branch for two possible outcomes of the test (has or doesn't have that feature).
  • I split instances into two subsets. One for each branch extending from the node.
  • I repeat recursively for each branch, using only instances that reach the branch.
  • I stop recursion for a branch if all its instances have the same class or I prune the tree at some given depth.

Dataset

I evaluate the decision tree on gender detection based on names. I use NationalNames data https://www.kaggle.com/kaggle/us-baby-names?select=NationalNames.csv which is released by data.gov.

How to run:

git clone https://github.com/mshadloo/Gender-Detection-with-Decision-Tree-from-scrach.git
cd Gender-Detection-with-Decision-Tree-from-scrach
dowloand the dataset from https://www.kaggle.com/kaggle/us-baby-names?select=NationalNames.csv
python main.py 

Experiment

First I extract features from names using some heuristics. I used the first letter, first two letters, first three letters, last letter, last two letters, last three letters, etc. as features. I built diffrent trees based on the the maximum depth that tree can be extended to. The experiments show that by increasing the maximum depth training accuracy increases as I expected.

About

In this repo, I implement a decision tree classifier from scratch using python and pandas. I use this decision tree to detect gender based on names. I use the first letter, first two letters, first three letters, last letter, last two letters, last three letters, etc. as features.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages