Skip to content

Humor Detection and Generation with Deep Learning using PyTorch

Notifications You must be signed in to change notification settings

lin-justin/humor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Humor Detection and Generation with Deep Learning

1. Detecting Humor

The data consists of 200,000 formal short texts (100k humorous, 100k not humorous). Here is the paper.

The purpose of this repository is to practice scripting, learn about deep learning in NLP, and completing a full project cycle of:

  • collecting data
  • preprocessing and cleaning text
  • structuring directories and scripts
  • implementing models
  • training and evaluating models
  • making it reproducible for others

Usage

Arguments and default hyperparameters:

python train.py --model="rnn" \
    --train_data_path="./data/train_clean.csv" \
    --test_data_path="./data/test_clean.csv" \
    --seed=1234 \
    --vectors="fasttext.simple.300d" \
    --max_vocab_size=750 \
    --batch_size=32 \
    --bidirectional=True \
    --dropout=0.5 \
    --hidden_dim=64 \
    --output_dim=1 \
    --n_layers=2 \
    --lr=1e-3 \
    --n_epochs=5 

The script was ran in a virtual environment using Miniconda3 on WSL Ubuntu and on a CPU.

Available models are rnn and bilistm.

Requirements

Python 3.7

  • pytorch==1.5.1
  • torchtext==0.6.0
  • transformers==3.0.2
  • texthero==1.0.9
pip install requirements.txt

To use spaCy tokenizer, make sure to run python -m spacy download en in the terminal.

Results

The models were trained for 5 epochs, if I had more powerful hardware, I would (and should) train for longer.

Model Test Loss Test Acc
RNN 0.138 95.05%
BiLSTM 0.139 95.27%

I did not use Google Colab to train these models because I need to learn good software/coding practice since notebooks can get messy and irreproducible. Additionally, using Google Colab's free GPU strains my internet's memory capacity.

Reference

Sentiment Analysis

2. Humorous Text Generation

Using only the 100k texts labelled humorous (see the last few lines of the preprocess.py script), I leveraged a pretrained OpenAI GPT-2 from Huggingface (a great NLP library).

Usage

python generate.py

The script only takes a sentence from the humorous text only dataset and generates 2 different sentences.

Results

Input sentence: I think my iphone is broken i keep pressing the home button but i am still at work

Output:

  • Sentence 1: I think my iphone is broken i keep pressing the home button but i am still at work and need a new one. I haven't changed my phone on all my work devices. I've been told it needs a new phone and

  • Sentence 2: I think my iphone is broken i keep pressing the home button but i am still at work in the morning on my iPhone. i just found out after reading this that its working OK. Also, i have another broken phone with 3

I referred to this tutorial.

About

Humor Detection and Generation with Deep Learning using PyTorch

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages