S3D: A Weakly Supervised Sarcasm Dataset

This is the repository for our 'Utilizing Weak Supervision to Create S3D: A Sarcasm Annotated Dataset' paper submitted to the EMNLP NLP+CSS 2022 workshop. This repository includes our SAD dataset along with version 1 and 2 of our S3D dataset. Both of these twitter datasets can be used for the purpose of training sarcasm detection models.

Datasets

SAD - We provide the Tweet IDs and the given sarcasm labels of 2340 manually annotated tweets which were collected observing the #sarcasm hashtag. Available on HuggingFace

S3D-v1 - We provide the Tweet IDs of 100,000 tweets along with their respective labels which were predicted by a fine-tuned BERTweet model which was trained on our 'Combined dataset', a corpus of over a million tweets and reddit comments labelled for sarcasm in previous works. Available on HuggingFace

S3D-v2 - We provide the Tweet IDs of 100,000 tweets along with their respective labels which were predicted by an ensemble of our 'best' three fine-tuned sarcasm detection models. Available on HuggingFace

Experiments

We provide a notebook to show the labelling process of our datasets. You can reproduce the experiments to create S3D-v1 and S3D-v2 via our Python notebooks which uses HuggingFace to load the relevant models to label the dataset.

Models

Models	Fine-tuned Models	Description
BERTweet	BERTweet-base-finetuned-SARC-combined-DS	BERTweet model fine-tuned on our combined dataset
BERTweet	BERTweet-base-finetuned-SARC-DS	BERTweet model fine-tuned on the SARC dataset
RoBERTa_large	roberta-large-finetuned-SARC-combined-DS	RoBERTa_large model fine-tuned on our combined dataset

Maintainer(s)

Jordan Painter
Diptesh Kanojia

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
data		data
imgs		imgs
nbs		nbs
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S3D: A Weakly Supervised Sarcasm Dataset

Datasets

Experiments

Models

Maintainer(s)

About

Contributors 2

Languages

License

surrey-nlp/S3D

Folders and files

Latest commit

History

Repository files navigation

S3D: A Weakly Supervised Sarcasm Dataset

Datasets

Experiments

Models

Maintainer(s)

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages