Skip to content

EURECOM D2KLab method for the European Statistics Deduplication Challenge 2023

Notifications You must be signed in to change notification settings

D2KLab/DeduplicationChallenge2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

D2KLab @ European Statistics Awards Deduplication Challenge 2023

This repository contains the source code used for reproducing the experiments conducted by D2KLab for the European Statistics Awards Deduplication Challenge 2023.

The source code on this repository was based on a Jupyter Notebook which is available on Google Colab at this link.

Requirements

  • Python >=3.9

How to use

  1. Install required packages using pip:
    pip install -r requirements.txt
  2. Copy the dataset file wi_dataset.csv into the same directory as this source code.
  3. Run main.py:
    python main.py
    After processing, this should create a new file named duplicates.csv.

Results

During the submission phase, our latest experiment obtained the following scores:

Full F1 Semantic F1 Temporal F1 Partial F1 Non-Duplicate F1 Macro F1
0.99 0.81 0.68 0.00 1.00 0.70

References

About

EURECOM D2KLab method for the European Statistics Deduplication Challenge 2023

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages