The ELCo Dataset

This repo provides the dataset and official implementations for our paper @ LREC-COLING 2024.
Local copy of our paper: https://yisong.me/publications/[email protected]
Local copy of our slides: https://yisong.me/publications/[email protected]

The ELCo.csv file encompasses the complete ELCo dataset, which is segmented into five distinctive columns:

EN: The English phrase;
EM: The emoji sequence corresponding to the English phrase;
Description: The description for the emojis;
Compositional strategy: The strategy used to compose the emoji, as identified in our corpus study. It includes direct representation, metaphorical representation, semantic list, reduplication, and single emojis.
Attribute: The attribute of the English phrase.

Preview of first 5 rows in the complete ELCo.csv:

EN	EM	Description	Composition strategy	Attribute
big business	👔📈	[':necktie:', ':chart_increasing:']	Metaphorical	SIZE
big business	🏢🤑🤑	[':office_building:', ':money-mouth_face:', ':money-mouth_face:']	Metaphorical	SIZE
big business	👨‍💻🤝	[':man_technologist:', ':handshake:']	Metaphorical	SIZE
big business	🏢🧑‍🤝‍🧑🧑‍🤝‍🧑🧑‍🤝‍🧑	[':office_building:', ':people_holding_hands:', ':people_holding_hands:', ':people_holding_hands:']	Metaphorical	SIZE
big business	👩‍💻🤑	[':woman_technologist:', ':money-mouth_face:']	Metaphorical	SIZE

Official Implementation for Benchmarking

Installation 📀💻

git clone [email protected]:WING-NUS/ELCo.git
conda activate
cd ELCo
cd scripts
pip install -r requirements.txt

Our codebase does not require specific versions of the packages in requirements.txt.
For most NLPers, probably you will be able to run our code with your existing virtual (conda) environments.

Running Experiments 🧪🔬

Specify Your Path 🏎️🛣️

Before running the bash files, please edit the bash file to specify your path to your local HuggingFace Cache.
For example, in scripts/unsupervised.sh:

#!/bin/bash

# Please define your own path here
huggingface_path=YOUR_PATH

you may change YOUR_PATH to the absolute directory location of your Huggingface Cache (e.g. /disk1/yisong/hf-cache).

Unsupervised Evaluation on EmoTE Task: 📘📝

conda activate
cd ELCo
bash scripts/unsupervised.sh

Fine-tuning on EmoTE Task: 📖📝

conda activate
cd ELCo
bash scripts/fine-tune.sh

Scaling Experiments: 📈

conda activate
cd ELCo
bash scripts/scaling.sh

Codebase Map 🗺️👩‍💻👨‍💻

All code is stored in the scripts directory. Data is located in benchmark_data.
Our bash files execute various configurations of emote.py:

emote.py: The controller for the entire set of experiments. Data loaders and encoders are also implemented here;
emote_config.py: This configuration file takes parameters from argparse as input and returns a configuration class, which is convenient for subsequent functions to call;
unsupervised.py: Called by emote.py, it performs unsupervised evaluation using a frozen model pretrained on the MNLI dataset. On the first run, a pretrained model will be downloaded from HuggingFace to your specified huggingface_path. Ensure there's enough space available (we recommend at least 20GB). The results are saved at benchmark_data/results/TE-unsup/ directory. This directory will be automatically created once the experiments are performed;
finetune.py: Also called by emote.py, it fine-tunes the pretrained models. This script saves the classification_report for each fine-tuning epoch and records the best test accuracy (when validation accuracy is optimized) in the _best.csv file at benchmark_data/results/TE-finetune/ directory. This directory will be automatically created once the experiments are performed.

Citations

If you find our work interesting, you are most welcome to try our dataset/codebase.
Please kindly cite our research if you have used our dataset/codebase:

@inproceedings{ELCoDataset2024,
    title = "The ELCo Dataset: Bridging Emoji and Lexical Composition",
    author = {Yang, Zi Yun  and
    	Zhang, Ziqing and
      Miao, Yisong},
    booktitle = "Proceedings of The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation",
    month = May,
    year = "2024",
    address = "Turino, Italy",
}

Contact 📤📥

If you have questions or bug reports, please raise an issue or contact us directly via the email:
Email address: 🐿@🐰
where 🐿️=yisong, 🐰=comp.nus.edu.sg

Licence

CC By 4.0

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
benchmark_data/exp-entailment		benchmark_data/exp-entailment
docs		docs
scripts		scripts
ELCo.csv		ELCo.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The ELCo Dataset

Official Implementation for Benchmarking

Installation 📀💻

Running Experiments 🧪🔬

Specify Your Path 🏎️🛣️

Unsupervised Evaluation on EmoTE Task: 📘📝

Fine-tuning on EmoTE Task: 📖📝

Scaling Experiments: 📈

Codebase Map 🗺️👩‍💻👨‍💻

Citations

Contact 📤📥

Licence

About

Releases

Packages

Languages

WING-NUS/ELCo

Folders and files

Latest commit

History

Repository files navigation

The ELCo Dataset

Official Implementation for Benchmarking

Installation 📀💻

Running Experiments 🧪🔬

Specify Your Path 🏎️🛣️

Unsupervised Evaluation on EmoTE Task: 📘📝

Fine-tuning on EmoTE Task: 📖📝

Scaling Experiments: 📈

Codebase Map 🗺️👩‍💻👨‍💻

Citations

Contact 📤📥

Licence

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages