An app that aims to organize your research: A Researcher with a Paper Based Approach
Author: @stepp1
🚨 DISCLAIMER:
This is a WIP. I worked on it for two weeks and wanted to have an MVP fast. When I started it, I didn't have any experience with Streamlit nor LangChain.
Any feedback is welcome!
Currently, the app supports the following features:
- 📚 LibraryChat: a chatbot that helps you understand a collection of papers. Powered by 🦜️🔗 LangChain, 🤗 Hugging Face Instructor, and 🤖 OpenAI.
- 🔎 PaperExplorer: a tool to explore papers in your library. Powered by 🤗 Hugging Face Instructor and Plotly.
- TODOs:
- Add: PaperChat: a chatbot that helps you understand a single paper
- Improve: Full Text Extraction
- Improve: Data Pipelines
- App: Decide if re-implement in a more flexible framework (e.g. Flask or FastAPI)
- App: Include pdf viewer when clicking on a paper
- Testing: add tests for data processing, embeddings, etc.
You will need a .env
file with the required keys: HUGGINGFACEHUB_API_TOKEN
, OPENAI_API_KEY
.
- Clone the repository
git clone [email protected]:stepp1/research-app.git
- Install the dependencies using conda/mamba
conda env create -f environment.yml
- Activate the environment
conda activate researcher
- Run the apps
chmod +x run.sh
./run.sh
Remember to forward the port for streamlit if you are running it on a server!
We provide a dataset.json
file stored at researcher/data/
that contains metadata for the default papers.
Full dataset is currently hosted on Zenodo: https://zenodo.org/record/7653458
More information about the dataset can be found here.
- Download the
dataset.json
file only:
curl -L https://zenodo.org/record/7653458/files/dataset.json -o researcher/data/dataset.json
- Download the
dataset.json
file and the images:
curl -L https://zenodo.org/record/7653458/files/data.tar.xz | tar -xJ -C researcher/data/