GitHub - iyves/auto-keyphrase-extraction-ru: Automatic Keyphrase Extraction from Russian-Language Scholarly Papers in Computational Linguistics

Undergraduate honors thesis

AUTOMATIC KEYPHRASE EXTRACTION FROM RUSSIAN-LANGUAGE SCHOLARLY PAPERS IN COMPUTATIONAL LINGUISTICS

Yves Wienecke
Thesis advisor: William Comer, Ph.D

Abstract. The automatic extraction of keyphrases from scholarly papers is a necessary step for many Natural Language Processing (NLP) tasks, including text retrieval, machine translation, and text summarization. However, due to the different grammatical and semantic intricacies of languages, this is a highly language-dependent task. Many free and open source implementations of state-of-the-art keyphrase extraction techniques exist, but they are not adapted for processing Russian text. Furthermore, the multi-linguistic character of scholarly papers in the field of Russian computational linguistics and NLP introduces additional complexity to keyphrase extraction. This is a free and open source program as a proof of concept for a topic-clustering approach to the automatic extraction of keyphrases from the largest conference on Russian computational linguistics and intellectual technologies, Dialogue. The goal is to use LDA and pyLDAvis to discover the latent topics of the Dialogue conference and to extract the salient keyphrases used by the research community. The conclusion points to needed improvements to techniques for PDF text extraction, morphological normalization, and candidate keyphrase ranking.

Keywords: Automatic Keyphrase Extraction, Topic Modeling, LDA, pyLDAvis, Scholarly Papers, Russian.

Honors Symposium Presentation

Running the code

In the src/ directory, there are three Jupyter Notebooks, corpus_creation.ipynb, preprocessing.ipynb, and topic_modeling.ipynb. For ease, it is recommend running the programs from the notebooks. Download or clone this repository to your local machine. Then, cd into the src directory and type

jupyter notebook

from a terminal. This should open Jupyter and allow you to open one of the notebooks. Once you have opened a notebook, navigate to a cell and press the Run button to run the code.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
papers		papers
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Undergraduate honors thesis

AUTOMATIC KEYPHRASE EXTRACTION FROM RUSSIAN-LANGUAGE SCHOLARLY PAPERS IN COMPUTATIONAL LINGUISTICS

Running the code

About

Releases

Packages

Languages

iyves/auto-keyphrase-extraction-ru

Folders and files

Latest commit

History

Repository files navigation

Undergraduate honors thesis

AUTOMATIC KEYPHRASE EXTRACTION FROM RUSSIAN-LANGUAGE SCHOLARLY PAPERS IN COMPUTATIONAL LINGUISTICS

Running the code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages