Skip to content

Commit

Permalink
init code&data
Browse files Browse the repository at this point in the history
  • Loading branch information
ruixing76 committed Jun 17, 2024
1 parent fa7d877 commit 54c7d0f
Show file tree
Hide file tree
Showing 35 changed files with 384,897 additions and 4,181 deletions.
Binary file modified .DS_Store
Binary file not shown.
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# private file
*GPT*.py
# Mac file
.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2023 Rui Xing
Copyright (c) 2024 Rui Xing

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
91 changes: 90 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,91 @@
# Transparent-FCExp
Evaluating Transparency of Machine Generated Fact Checking Explanations

![License](https://img.shields.io/badge/license-MIT-blue)
![Version](https://img.shields.io/badge/version-1.0.0-blue)
[![made-with-python](https://img.shields.io/badge/Made%20with-Python-red.svg)](#python)
![Coverage](https://img.shields.io/badge/coverage-70%25-brightgreen)



This paper contains code and data for the paper "Evaluating Transparency of Machine Generated Fact Checking Explanations".

## Abstract
An important factor when it comes to generating fact-checking explanations is the selection of evidence: intuitively, high-quality explanations can only be generated given the right evidence. In this work, we investigate the impact of human-curated vs. machine-selected evidence for explanation generation using large language models. To assess the quality of explanations, we focus on transparency (whether an explanation cites sources properly) and utility (whether an explanation is helpful in clarifying a claim). Surprisingly, we found that large language models generate similar or higher quality explanations using machine-selected evidence, suggesting carefully curated evidence (by humans) may not be necessary. That said, even with the best model, the generated explanations are not always faithful to the sources, suggesting further room for improvement in explanation generation for fact-checking.
## Installation
```
git clone [email protected]:ruixing76/Transparent-FCExp.git && cd Transparent-FCExp && pip install -r requirement.txt
```

## Data
### Original data
The original PolitiHop data can be downloaded here: https://github.com/copenlu/politihop. Please put the data under `./data/PolitiHop_data/`.

### Generated Explanation
The generated explanation data is mainly used in our work, which is stored here: `./data/TransExp_data/{model_name}_{setting}_data.json`.

- `model_name` should be `gpt4`, `gpt35` or `llama2-70b`.
- `setting` should be `core` (Human setting) or `full` (Machine setting).

### Data Format
```json
"CLAIM_ID": {
"claim": "claim content",
"label": "claim veracity label from {true, false and half-true}",
// No. 12 reason cited in explanation is masked
"masked_reason": 12,
// No.1 sentence is the ground-truth
"ans_sens": [
1
],
"core_reasons": [
"12: No.12 core reason content"
],
// 1: explanation sentence is the answer sentence
"explanation": [
"0: explanation sentence",
"1: explanation sentence [12]",
"2: explanation sentence"
],
// top 2 annotator's choices, -2 indicates 'no citation'
"top_choice": [
-2
2
]
}
```

## Preprocessing
```
python preprocess_politihop.py
```
It is recommended to use our preprocessed data under: `./data/TransExp_data/raw_dataset/raw_dataset.json`.

## Explanation Generation
Generate explanation using:
```
python generate_explanation.py -model_name llama2-70b -output_dir output_dir
```
- `-model_name` should be `gpt4`, `gpt35` or `llama2-70b`.
- `-output_dir` is output directory for generated explanation.

Postprocess generated explanation, extract, mask and sample citation.
```
python postprocess_generation.py -model_name llama2-70b -output_dir output_dir
```
- `-model_name` should be `gpt4`, `gpt35` or `llama2-70b`.
- `-output_dir` is output directory for postprocessed explanation.

## Annotation
Generate annotation using:
```
cd ./annotation
python create_annotation_data.py
```

Annotation is performed on [Amazon Mechanical Turk](https://www.mturk.com/). The webpage template can be found under `./annotation/annotation_platform.html`. Generate annotation webpage using:
```
python create_HIT.py
```

## Cite
If you find this work useful, please cite our paper.
Loading

0 comments on commit 54c7d0f

Please sign in to comment.