Please check the datasets folder for details.
To use this project, you need to have the following dependencies installed:
- Python 3.7 or higher
- Some Python libraries (Specified in requirements.txt)
Clone this repository to your local machine:
git clone https://github.com/mooselab/suppmaterial-PostDupGPT3.git
You can install the Python libraries using pip:
cd ./suppmaterial-PostDupGPT3/src
pip install -r requirements.txt
This replication package contains a tiny sample dataset for testing the codes.
cd ./src
python ./train_triplet_loss.py
cd ./src
python ./train_MNR_loss.py
We re-implemented the DupPredictor. The re-implementation is under DupPredictor_ReImp
folder.
A dataset use to do the grid search for parameters and CQADupStack testsets of nine domains are included.
- Install the requirements:
cd ./DupPredictor_ReImp
pip install -r requirements.txt
- Search for the best parameters for composer:
python ./DupPred_param_search.py
This process involves the training of the topic model. The trained model will be saved and used in the evaluation process.
- Predict with testsets:
python DupPredictor.py
By running it, all the nine sub-domains will be iterated.
The Tools
folder contains codes used to generate the GPT-3 embeddings and print the prediction results.
Xingfang Wu, Heng Li, Nobukazu Yoshioka, Hironori Washizaki, Foutse Khomh, Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection, Proceedings of the 31st IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER), March 12 - 15, 2024, Rovaniemi, Finland, IEEE.
This project is licensed under the MIT License.