bash setup_env.sh
- Prepare processed dataset
we uploaded the preocessed dataset and checkpoint on huggingface, use command below to download.
# preprocessed dataset
huggingface-cli download Leon-Chang/exp --repo-type dataset --local-dir ./tmp/
# checkpoint
mkdir -p res/Musical_Instruments/
huggingface-cli download Leon-Chang/g2p2_ckpts --repo-type dataset --local-dir ./res/Musical_Instruments/
- Run the experiment We provide the script to run the experiment, for example you can use below command to run the experiment on Musical_Instruments dataset
bash fs_epochs_metric.sh Musical_Instruments
Make sure the Amazon dataset is in data
folder
for example Musical_Instruments dataset
mkdir data; cd data
wget https://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Musical_Instruments.json.gz
and also its meta-data
wget https://snap.stanford.edu/data/amazon/productGraph/categoryFiles/meta_Musical_Instruments.json.gz
- Preprocess the dataset
python g2p2_ext/preprocess_amazon.py
- Pre-Train the model
Note: this step might take 1 epoch per day, depend on your device.
if you want to reproduce the model, then just run it or you can use our model checkpoint, see more detail on below.
python main_train_amazon.py