ADCE and AICE

The official code for paper: Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension Ability (ICLR'25).

Pipeline

Environment

OS: Ubuntu 22.04.2 LTS 

python=3.10.14
torch=2.4.1
transformers=4.43.4
llama-recipes=0.0.3
datasets=2.21.0
accelerate=0.34.2
evaluate=0.4.3

Fine-tuning

cd finetune

# civil comments
python finetune_civil.py --model_name llama-3-8b --level 0.3

# analytic entailment
python finetune_entail.py

ADCE & AICE Calculation

To test closed-source models, run commands below:

cd dce_calculation

# civil comments
bash agent_main.sh --dataset_name 2_digit_multiplication --prompt superhigh  --interv_type mask --num_mask 2 --mask_fix_position 0 

# 2_digit_multiplication
bash agent_main.sh --dataset_name 2_digit_multiplication --prompt superhigh  --interv_type mask --num_mask 2 --mask_fix_position 0 

# analytic_entailment
bash agent_main.sh --dataset_name analytic_entailment --interv_type rephrase --prompt superhigh  --mask_fix_position 0 --num_mask 2 

# GSM8k
bash agent_main.sh --dataset_name GSM8k --prompt superhigh --interv_type mask  --mask_fix_position 0 --num_mask 2

# word_unscrambling
bash agent_main.sh --dataset_name word_unscrambling --mask_fix_position 2 --num_mask 1   --interv_type mask --prompt superhigh 

# CommonsenseQA
bash agent_main.sh --dataset_name  commonsenseqa --prompt csuperhigh --interv_type rephrase --num_mask 2 --mask_fix_position 0

To test open-source models, run commands below:

cd dce_calculation

# civil comments
bash white_main.sh --dataset_name 2_digit_multiplication --prompt superhigh  --interv_type mask --num_mask 2 --mask_fix_position 0 

# 2_digit_multiplication
bash white_main.sh --dataset_name 2_digit_multiplication --prompt superhigh  --interv_type mask --num_mask 2 --mask_fix_position 0 

# analytic_entailment
bash white_main.sh --dataset_name analytic_entailment --interv_type rephrase --prompt superhigh --num_mask 2 --mask_fix_position 0

# GSM8k
bash white_main.sh --dataset_name GSM8k --prompt superhigh --interv_type mask --num_mask 2 --mask_fix_position 0 

# word_unscrambling
bash white_main.sh --dataset_name word_unscrambling --mask_fix_position 2 --num_mask 1   --interv_type mask --prompt superhigh 

# CommonsenQA
bash white_main.sh --dataset_name commonsenseqa --prompt csuperhigh --interv_type rephrase --num_mask 2 --mask_fix_position 0

Automatic paraphrase

cd intervention_rephrase
python generate_intervention_commonsenseqa.py

Citation

Please cite our paper if this repository inspires your work.

@misc{han2024surfacestructurecausalassessment,
      title={Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension Ability}, 
      author={Yujin Han and Lei Xu and Sirui Chen and Difan Zou and Chaochao Lu},
      year={2024},
      eprint={2411.19456},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.19456}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
dce_calculation		dce_calculation
finetune		finetune
img		img
intervention_rephrase		intervention_rephrase
utils		utils
README.md		README.md
model_data_config.json		model_data_config.json
model_data_config_agent.json		model_data_config_agent.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADCE and AICE

Pipeline

Environment

Fine-tuning

ADCE & AICE Calculation

Automatic paraphrase

Citation

About

Releases

Packages

Languages

AI45Lab/ADCE

Folders and files

Latest commit

History

Repository files navigation

ADCE and AICE

Pipeline

Environment

Fine-tuning

ADCE & AICE Calculation

Automatic paraphrase

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages