FedCVT

This is the official repo for the paper FedCVT: Semi-supervised Vertical Federated Learning with Cross-View Training. The arxiv version is available at here

Note that this codebase is the Pytorch implementation of FedCVT. The original implementation is based on Tensorflow. As a result, the empirical results of the Pytorch version may differ in those of the Tensorflow version, as reported in the paper.

1. Methodology

The workflow of FedCVT is described as follows and illustrated in the following figure.

2. Dataset

We use the following datasets for experiments.

NUSWIDE can be downloaded at here or here
Avazu is located in the data directory.
CIFAR10 can be downloaded using Pytorch.
Vehicle can be downloaded at here with the name of "SensIT Vehicle (combined)"

You can adopt any dataset to run the code.

3. Run the code

The entry points for running the experiments on Avazu, BHI and NUSWIDE are

fedcvt_avazu_exp_run.py,
fedcvt_bhi_exp_run.py, and
fedcvt_nuswide_exp_run.py, respectively.

You can change the hyperparameters in these python files.

4. Citation

If you think our work is helpful and used our code in your work, please cite our paper:

@article{kang2022fedcvt,
author = {Kang, Yan and Liu, Yang and Liang, Xinle},
title = {FedCVT: Semi-supervised Vertical Federated Learning with Cross-view Training},
year = {2022},
issue_date = {August 2022},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {13},
number = {4},
issn = {2157-6904},
url = {https://doi.org/10.1145/3510031},
doi = {10.1145/3510031},
abstract = {Federated learning allows multiple parties to build machine learning models collaboratively without exposing data. In particular, vertical federated learning (VFL) enables participating parties to build a joint machine learning model based upon distributed features of aligned samples. However, VFL requires all parties to share a sufficient amount of aligned samples. In reality, the set of aligned samples may be small, leaving the majority of the non-aligned data unused. In this article, we propose Federated Cross-view Training (FedCVT), a semi-supervised learning approach that improves the performance of the VFL model with limited aligned samples. More specifically, FedCVT estimates representations for missing features, predicts pseudo-labels for unlabeled samples to expand the training set, and trains three classifiers jointly based upon different views of the expanded training set to improve the VFL model’s performance. FedCVT does not require parties to share their original data and model parameters, thus preserving data privacy. We conduct experiments on NUS-WIDE, Vehicle, and CIFAR10 datasets. The experimental results demonstrate that FedCVT significantly outperforms vanilla VFL that only utilizes aligned samples. Finally, we perform ablation studies to investigate the contribution of each component of FedCVT to the performance of FedCVT.},
journal = {ACM Trans. Intell. Syst. Technol.},
month = {may},
articleno = {64},
numpages = {16},
keywords = {Vertical federated learning, semi-supervised learning, cross-view training}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FedCVT

1. Methodology

2. Dataset

3. Run the code

4. Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

FedCVT

1. Methodology

2. Dataset

3. Run the code

4. Citation