Skip to content
forked from WenZhihao666/G2P2

Augmenting Low-Resource Text Classification with Graph-Grounded Pre-training and Prompting. SIGIR-2023

License

Notifications You must be signed in to change notification settings

king0980692/G2P2

 
 

Repository files navigation

G2P2 Rec

Setup Environment

bash setup_env.sh

Downstream adaption for recommendation

  1. Prepare processed dataset

we uploaded the preocessed dataset and checkpoint on huggingface, use command below to download.

# preprocessed dataset
huggingface-cli download Leon-Chang/exp --repo-type dataset --local-dir ./tmp/

# checkpoint
mkdir -p res/Musical_Instruments/

huggingface-cli download Leon-Chang/g2p2_ckpts --repo-type dataset --local-dir ./res/Musical_Instruments/
  1. Run the experiment We provide the script to run the experiment, for example you can use below command to run the experiment on Musical_Instruments dataset
bash fs_epochs_metric.sh Musical_Instruments

Pre-Train

Make sure the Amazon dataset is in data folder

for example Musical_Instruments dataset

mkdir data; cd data
wget https://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Musical_Instruments.json.gz

and also its meta-data

wget https://snap.stanford.edu/data/amazon/productGraph/categoryFiles/meta_Musical_Instruments.json.gz
  1. Preprocess the dataset
    python g2p2_ext/preprocess_amazon.py
  1. Pre-Train the model

Note: this step might take 1 epoch per day, depend on your device.

if you want to reproduce the model, then just run it or you can use our model checkpoint, see more detail on below.

    python main_train_amazon.py

About

Augmenting Low-Resource Text Classification with Graph-Grounded Pre-training and Prompting. SIGIR-2023

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.8%
  • Cython 1.3%
  • Shell 0.9%