a research paper answering engine focused on openly accessible ML papers. give it a query, get back ai-generated responses with citations and figures from relevant papers.
- arxiv + semantic scholar api for paper metadata and pdfs
- paper content processed from pdfs using pymupdf
- frontend: next.js 15 + app router, tailwind, shadcn/ui
- backend: fastapi + uvicorn
- vector store: chromadb (running on cloud)
- llm: openai gpt 4o-mini
- embeddings: openai text-embedding-3-large
- storage: cloudflare r2 for extracted figures
- clone the repo:
git clone https://github.com/seatedro/arxival.git
cd arxival
- set up backend:
cd server
python -m venv .venv
source .venv/bin/activate # or `.venv\Scripts\activate` on windows
pip install -r requirements.txt
- set up env vars:
# backend (.env in server/)
OPENAI_API_KEY=your_key
PINECONE_API_KEY=your_token
PINECONE_HOST=your_server
R2_ENDPOINT=your_endpoint
R2_ACCESS_KEY_ID=your_key
R2_SECRET_ACCESS_KEY=your_key
- start the backend:
python run.py
- set up frontend:
cd ui
npm install
- set up frontend env:
# frontend (.env in ui/)
NEXT_PUBLIC_BACKEND_URL=http://localhost:8000
- start the frontend:
npm run dev
- (optional) ingest some papers:
cd ../server
python cli_batch.py --query "machine learning" --max-papers 50
hit up http://localhost:3000
and you're good to go! 🎉