Skip to content

Latest commit

 

History

History
79 lines (54 loc) · 5.19 KB

README.md

File metadata and controls

79 lines (54 loc) · 5.19 KB

Composable NLP Workflows using Forte for BERT-based Ranking and QA System

End-to-end Ranking and Question-Answering (QA) system using Forte, a toolkit that makes composable NLP pipelines.

You can read the full report here

Tasks

  • Build an end-to-end QA system with following components
    • Full-ranker using ElasticSearch indexer and BM25 algorithm
    • Re-ranker using BERT
    • Question-Answering using BERT

alt text

Datasets

The task has been implemented using two sets of datasets:

How to run the Pipeline

  • Make sure the input and reference file paths are properly set in config.yml

  • Run the followng commands:

    • On Linux/Mac Run export PYTHONPATH="$(pwd):$PYTHONPATH
    • On windows add the code to the file to be runimport sys; sys.path.append('.')
  • Create elastic search index

    • Modify config.yml with proper datasets and elastic search index name
    • Run python src/indexers/msmarco_indexer.py --config_file config.yml
  • Run pipeline

    • Modify config.yml with proper dataset name, filenames, re-ranking size.
    • Run pipeline/msmarco_reranker_qa_pipeline.py --config_file config.yml
  • Results are saved in output folder

  • Changes to be done for Covid QA: Change config_cord.yml. Run the cord_indexer.py to index the cord-19 documents and cord_reranker_qa_pipeline.py to get the answers

Pipeline Information Flow

Experiment Results

MS-MARCO

Result on 1000 queries from dev.small with multiple re-ranking sizes

Full Ranking Reranker QA
Re-Ranking Size Time per Query(s) MRR@10 MRR@100 Recall@10 Recall@100 MRR@10 MRR@100 Recall@10 Recall@100 BLEU-1 BLEU-2 BLEU-3 BLEU-4 ROUGE-L PRECISION RECALL F1 Semantic Sim
1 0.49 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.24 0.15 0.11 0.09 0.22 0.20 0.23 0.21 0.75
10 0.56 0.16 0.16 0.34 0.34 0.23 0.23 0.34 0.34 0.30 0.21 0.17 0.15 0.29 0.26 0.32 0.29 0.79
50 0.85 0.16 0.17 0.34 0.50 0.28 0.28 0.45 0.50 0.31 0.23 0.19 0.17 0.31 0.27 0.34 0.30 0.79
100 1.24 0.16 0.17 0.34 0.59 0.30 0.30 0.50 0.59 0.31 0.24 0.20 0.18 0.32 0.28 0.35 0.31 0.80
500 4.63 0.16 0.17 0.34 0.59 0.33 0.33 0.56 0.73 0.32 0.24 0.21 0.19 0.32 0.29 0.36 0.32 0.80
1000 8.80 0.16 0.17 0.34 0.59 0.34 0.35 0.58 0.77 0.32 0.25 0.21 0.19 0.32 0.29 0.36 0.32 0.80

Covid-19

Results on all covid-qa queries with multiple re-ranking sizes

QA
Re-Ranking Size Time per Query(s) BLEU-1 BLEU-2 BLEU-3 BLEU-4 ROUGE-L PRECISION RECALL F1 Semantic Sim
100 1.21 0.20 0.15 0.13 0.12 0.22 0.18 0.29 0.22 0.71
1000 6.64 0.20 0.15 0.13 0.12 0.22 0.18 0.29 0.22 0.71

Acknowledgements

We would like to thank Professor Dr. Zhiting Hu and Dr. Zhengzhong (Hector) Liu for guiding us throughout the project. We also extend our gratitude towards Petuum Inc. for providing us the computing support needed to run our pipeline on GPU.

Contact

Murali Mohana Krishna Dandu - [email protected] Linkedin LinkedIn   Gaurav Kumar - [email protected] Linkedin LinkedIn