Skip to content

seatedro/arxival

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arXival 📚

ArXival Banner

a research paper answering engine focused on openly accessible ML papers. give it a query, get back ai-generated responses with citations and figures from relevant papers.

what's under the hood 🔧

data sources

  • arxiv + semantic scholar api for paper metadata and pdfs
  • paper content processed from pdfs using pymupdf

tech stack

  • frontend: next.js 15 + app router, tailwind, shadcn/ui
  • backend: fastapi + uvicorn
  • vector store: chromadb (running on cloud)
  • llm: openai gpt 4o-mini
  • embeddings: openai text-embedding-3-large
  • storage: cloudflare r2 for extracted figures

running locally 🚀

  1. clone the repo:
git clone https://github.com/seatedro/arxival.git
cd arxival
  1. set up backend:
cd server
python -m venv .venv
source .venv/bin/activate  # or `.venv\Scripts\activate` on windows
pip install -r requirements.txt
  1. set up env vars:
# backend (.env in server/)
OPENAI_API_KEY=your_key
PINECONE_API_KEY=your_token
PINECONE_HOST=your_server
R2_ENDPOINT=your_endpoint
R2_ACCESS_KEY_ID=your_key
R2_SECRET_ACCESS_KEY=your_key
  1. start the backend:
python run.py
  1. set up frontend:
cd ui
npm install
  1. set up frontend env:
# frontend (.env in ui/)
NEXT_PUBLIC_BACKEND_URL=http://localhost:8000
  1. start the frontend:
npm run dev
  1. (optional) ingest some papers:
cd ../server
python cli_batch.py --query "machine learning" --max-papers 50

hit up http://localhost:3000 and you're good to go! 🎉