-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
cyente
committed
Jul 6, 2020
1 parent
537158a
commit addf97a
Showing
38 changed files
with
1,023,605 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
# FiGNN for CTR prediction | ||
The code and data for our paper in CIKM2019: Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction, | ||
[arxiv](https://arxiv.org/abs/1910.05552). | ||
|
||
|
||
|
||
<div align=center> | ||
<img src="https://github.com/Songweiping/RecSys/blob/master/featureRec/figures/model.png" width = 50% height = 50% /> | ||
</div> | ||
In AutoInt, we first project all sparse features (both categorical and numerical features) into the | ||
low-dimensional space. Next, we feed embeddings of all fields into stacked multiple interacting layers | ||
implemented by self-attentive neural network. The output of the final interacting layer is the low-dimensional | ||
representation of learnt combinatorial features, which is further used for estimating the CTR via sigmoid function. | ||
|
||
|
||
Next, we introduce how to run FiGNN on four benchmark data sets. | ||
|
||
## Requirements: | ||
* **Tensorflow 1.5.0** | ||
* Python 3.6 | ||
* CUDA 9.0+ (For GPU) | ||
|
||
## Usage | ||
Our code is based on Weiping Song and Chence Shi's [AutoInt](https://github.com/DeepGraphLearning/RecommenderSystems/tree/master/featureRec). | ||
### Input Format | ||
The required input data is in the following format: | ||
* train_x: matrix with shape *(num_sample, num_field)*. train_x[s][t] is the feature value of feature field t of sample s in the dataset. The default value for categorical feature is 1. | ||
* train_i: matrix with shape *(num_sample, num_field)*. train_i[s][t] is the feature index of feature field t of sample s in the dataset. The maximal value of train_i is the feature size. | ||
* train_y: label of each sample in the dataset. | ||
|
||
If you want to know how to preprocess the data, please refer to `data/Dataprocess/Criteo/preprocess.py` | ||
|
||
### Example | ||
There are four public real-world datasets (Avazu, Criteo, KDD12, MovieLens-1M) that you can choose. You can run the code on MovieLens-1M dataset directly in `/movielens`. The other three datasets are super huge, and they can not be fit into the memory as a whole. Therefore, we split the whole dataset into 10 parts and we use the first file as test set and the second file as valid set. We provide the codes for preprocessing these three datasets in `data/Dataprocess`. If you want to reuse these codes, you should first run `preprocess.py` to generate `train_x.txt, train_i.txt, train_y.txt` as described in `Input Format`. Then you should run `data/Dataprocesss/Kfold_split/StratifiedKfold.py` to split the whole dataset into ten folds. Finally you can run `scale.py` to scale the numerical value(optional). | ||
|
||
To help test the correctness of the code and familarize yourself with the code, we upload the first `10000` samples of `Criteo` dataset in `train_examples.txt`. And we provide the scripts for preprocessing and training.(Please refer to ` data/sample_preprocess.sh` and `run_criteo.sh`, you may need to modify the path in `config.py` and `run_criteo.sh`). | ||
|
||
After you run the `data/sample_preprocess.sh`, you should get a folder named `Criteo` which contains `part*, feature_size.npy, fold_index.npy, train_*.txt`. `feature_size.npy` contains the number of total features which will be used to initialize the model. `train_*.txt` is the whole dataset. If you use other small dataset, say `MovieLens-1M`, you only need to modify the function `_run_` in `autoint/train.py`. | ||
|
||
Here's how to run the preprocessing. | ||
|
||
``` | ||
cd data | ||
mkdir Criteo | ||
python ./Dataprocess/Criteo/preprocess.py | ||
python ./Dataprocess/Kfold_split/stratifiedKfold.py | ||
python ./Dataprocess/Criteo/scale.py | ||
``` | ||
|
||
Besides our proposed model FiGNN, you can also choose AutoInt model. You should specify the model type (FiGNN or AutoInt) when running the training. | ||
|
||
Here's how to run the training. | ||
|
||
``` | ||
CUDA_VISIBLE_DEVICES=0 python -m code.train \ | ||
--model_type FiGNN \ | ||
--data_path data --data Criteo \ | ||
--blocks 3 --heads 2 --block_shape "[64,64,64]" \ | ||
--is_save --has_residual \ | ||
--save_path ./models/Criteo/fignn_64x64x64/ \ | ||
--field_size 39 --run_times 1 \ | ||
--epoch 3 --batch_size 1024 \ | ||
``` | ||
|
||
You should see the output like this: | ||
|
||
``` | ||
... | ||
train logs | ||
... | ||
start testing!... | ||
restored from ./models/Criteo/b3h2_64x64x64/1/ | ||
test-result = 0.8088, test-logloss = 0.4430 | ||
test_auc [0.8088305055534442] | ||
test_log_loss [0.44297631300399626] | ||
avg_auc 0.8088305055534442 | ||
avg_log_loss 0.44297631300399626 | ||
``` | ||
|
||
## Citation | ||
If you find FiGNN useful for your research, please consider citing the following paper: | ||
``` | ||
@inproceedings{li2019fi, | ||
title={Fi-gnn: Modeling feature interactions via graph neural networks for ctr prediction}, | ||
author={Li, Zekun and Cui, Zeyu and Wu, Shu and Zhang, Xiaoyu and Wang, Liang}, | ||
booktitle={Proceedings of the 28th ACM International Conference on Information and Knowledge Management}, | ||
pages={539--548}, | ||
year={2019} | ||
} | ||
``` | ||
|
||
|
||
## Contact information | ||
You can contact Zekun Li (`[email protected]`), if there are questions related to the code. | ||
|
||
|
||
## Acknowledgement | ||
This implementation is based on Weiping Song and Chence Shi's [AutoInt](https://github.com/DeepGraphLearning/RecommenderSystems/tree/master/featureRec). Thanks for their sharing and contribution. |
Binary file not shown.
Oops, something went wrong.