HDI Highlighter

Description

Reading helper for pharmacological interaction articles

HDI-Highlighter is a tool allowing user to highlight PDFs of scientific articles about Herb-Drug Interactions (HDI). It provides a visual assistance by highlighting terms of interests, including:

DCI, herbs and enzyme names. Enzymes mostly focus on CYP450 isoenzymes as they are by far the most studied interactions targets.
Study types; i.e. case reports, clinical studies, in vitro, ...
Dosages
Percentages
Words implying a variation in a parameter

PDF extraction

To be able to process the content of a PDF file, the first task is to extract it from the file. This is a non-trivial task, as PDF are not designed to be reworked. Fortunately, a Python module named "PyMuPDF" (https://github.com/pymupdf/PyMuPDF) allows for an easy extraction. This package provides a solution to extract content of the file in a formatted way. One options allows to extract it in HTML format, what is interesting for us as:

HTML files are handeld directly by Unitex without any preprocessing step required, in exchange of a longer processing time
PDF-Highlighter being a webapp, HTML can be directly displayed without any need to rework it.

PyMuPDF provides a nearly perfect extraction of the PDF, even though some imprefections might appear in different step of the highlighting process.

Expressions highlighting

HDI-Highlighter uses automatons generated using Unitex/Gramlab (https://github.com/UnitexGramLab/) and incorporated in Python script using it's Python bindings (https://github.com/patwat/python-unitex) to extract content.

Requirements

Python 3
PyMuPDF
python-unitex (https://github.com/patwat/python-unitex), requires Unitex to be installed (https://unitexgramlab.org/)
Unidecode
Flask
Flaskwebgui
Gunicorn

Use

In a terminal, go to hdi highlighter folder:

  cd path/to/hdi_higlighter

Run gui.py to start a local server:

  python gui.py

A message in the terminal will give you the adress of the local server. Copy the link and open it in any browser.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
__pycache__		__pycache__
config		config
extracted		extracted
full_pdf_snt		full_pdf_snt
static		static
tagged_pdf_snt		tagged_pdf_snt
templates		templates
LICENSE		LICENSE
README.md		README.md
gui.py		gui.py
highlighter.py		highlighter.py
pdf_tagger.py		pdf_tagger.py
text_extractor.py		text_extractor.py
unitex_tagger.py		unitex_tagger.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HDI Highlighter

Description

PDF extraction

Expressions highlighting

Requirements

Use

About

Releases

Packages

Languages

License

ancnudde/hdi_highlighter

Folders and files

Latest commit

History

Repository files navigation

HDI Highlighter

Description

PDF extraction

Expressions highlighting

Requirements

Use

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages