- Linux
- Python 3.9
- PyTorch 1.10.0+cu111
Clone this repo.
git clone https://github.com/THUDM/paper-source-trace.git
cd paper-source-trace
Please install dependencies by
pip install -r requirements.txt
The dataset can be downloaded from BaiduPan with password bft3, Aliyun or DropBox. The paper XML files are generated by Grobid APIs from paper pdfs.
Run Baselines for KDD Cup 2024
First, download DBLP dataset from AMiner.
Put the unzipped PST directory into data/
and unzipped DBLP dataset into data/PST/
.
cd $project_path
export CUDA_VISIBLE_DEVICES='?' # specify which GPU(s) to be used
export PYTHONPATH="`pwd`:$PYTHONPATH"
# Method 1: Random Forest
python rf/process_kddcup_data.py
python rf/model_rf.py # output at out/kddcup/rf/
# Method 2: Network Embedding
python net_emb.py # output at out/kddcup/prone/
# Method 3: SciBERT
python bert.py # output at out/kddcup/scibert/
Method | MAP |
---|---|
Random Forest | 0.21420 |
ProNE | 0.21668 |
SciBERT | 0.29489 |
If you find this repo useful in your research, please cite the following papers:
@article{zhang2024pst,
title={PST-Bench: Tracing and Benchmarking the Source of Publications},
author={Fanjin Zhang and Kun Cao and Yukuo Cen and Jifan Yu and Da Yin and Jie Tang},
journal={arXiv preprint arXiv:2402.16009},
year={2024}
}
@inproceedings{zhang2024oag,
title={OAG-bench: a human-curated benchmark for academic graph mining},
author={Fanjin Zhang and Shijie Shi and Yifan Zhu and Bo Chen and Yukuo Cen and Jifan Yu and Yelin Chen and Lulu Wang and Qingfei Zhao and Yuqing Cheng and Tianyi Han and Yuwei An and Dan Zhang and Weng Lam Tam and Kun Cao and Yunhe Pang and Xinyu Guan and Huihui Yuan and Jian Song and Xiaoyan Li and Yuxiao Dong and Jie Tang},
booktitle={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages={6214--6225},
year={2024}
}
Hello everyone,
We've created an online WeChat paper-sharing group where each member is required to share 2 computer science papers every week. We have established mechanisms of rewards and penalties for members who do and do not share papers as required. You are free to join or leave at any time. Welcome to join us! (You can receive the up-to-date QR code from this channel. https://t.me/+apOrPEOLGixiNjdl)