Description

Python scripts to produce featrures from transformer architectures, later to be augmented along with hand-crafted features to classify Readability through classifiers including SVM, Random Forest, and Gradient Boosting.

Citation

The code is dedicated to our published paper,

Pushing on Readability: A Transformer Meets Handcrafted Linguistic Features, EMNLP 2021.

Please cite our paper and provide link to this repository if this code is used for your research.

Dataset

The following datasets are used throughout the paper

WeeBit.csv
OneStop.csv
cambridge.csv

How to Run

Prior to training, create K-Fold of a dataset to train on.

python kfold.py --corpus_path WeeBit.csv --corpus_name weebit

Then, stratified folds of data will be saved under file name "data/weebit.{k}.{type}.csv". Here k denotes k-th of the K-Fold and type is either train, valid, or test.

Then, fine-tune BertForSequenceClassification on dataset specified.

python train.py --corpus_name weebit --model roberta --learning_rate 3e-5

Then create new features with a trained model.

python inference.py --checkpoint_path checkpoint/weebit.roberta.0.14 --data_path data/weebit.0.test.csv

Collected features will later be fed into classifiers.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
dataloader.py		dataloader.py
inference.py		inference.py
kfold.py		kfold.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Citation

Dataset

How to Run

About

Releases

Packages

Languages

hhlim333/pushingonreadability_transformers

Folders and files

Latest commit

History

Repository files navigation

Description

Citation

Dataset

How to Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages