Basic background on developing from this codebase.
This repository uses Conda to manage two dependencies: Python and Poetry. (This SO post provides more context on using Conda and Poetry together.)
Install conda or miniconda. Then, create the needed environment, called llm-math-education
.
conda env create -f conda-environment.yml
- Activate the conda environment:
conda activate llm-math-education
- Use
make install
to install all needed dependencies (including the pre-commit hooks).
Ideally, the Makefile would activate the needed conda environment, but I don't actually know enough make
to add that.
To deploy to PyPI, set your API token and deploy (see guide):
poetry config pypi-token.pypi "{your API token generated on pypi.org}"
poetry build
poetry publish
Create a .env
file in the project root with your OpenAI API key defined as an environment variable inside it:
echo "OPENAI_API_KEY={your-api-key-here}" > .env
make test
poetry run <command>
- Run the given command, e.g.poetry run pytest
invokes the tests.source $(poetry env info --path)/bin/activate
- An alternative topoetry shell
that's less buggy in conda environments.poetry add <package>
- Add the given package as a dependency. Use flag-G dev
to add it as a development dependency.conda remove -n llm-math-education --all
- Tear it all down, so first-time setup can be repeated.
To use Pyserini, need to install a few additional things.
First, Java 7+ JDK should be installed and configured reasonably. I installed the latest build (JDK 20.0.1) with no issues.
To install NMSLib: (see issue) CFLAGS="-mavx -DWARN(a)=(a)" pip install --use-pep517 nmslib
Ultimately, I added it to a new dependency group (anserini
) like so:
CFLAGS="-mavx -DWARN(a)=(a)" poetry add nmslib -G anserini
This likely means that rebuilds won't succeed without first installing nmslib with these CFLAGS, although I haven't tested it. Use poetry install --with anserini
to include during a build.
FAISS (needed for Pyserini) installed in the outer conda environment: conda install -c pytorch faiss-cpu
I did not include the FAISS dependency in environment.yaml
.
A few notes about the app.
Manually deployed via levon003
's (GitHub-linked) Streamlit account.
For local development, streamlit secrets need to be stored in .streamlit/secrets.toml
Here's a sample file:
OPENAI_API_KEY = "{key goes here}"
AUTH_TOKEN = "argon2:$argon2id$v=19$m=10240,t=10,p=8$MuVIOw20jkOi1nKR90hPhA$H22nY8aNyfztLYQCSj5NRw5/Cy2WOo6kl3K61RyaoZY"
To generate the auth_token:
>>> from streamlit_app import auth_utils
>>> import getpass
>>> auth_utils.passwd_hash(getpass.getpass())
Password: abc
'argon2:$argon2id$v=19$m=10240,t=10,p=8$VfrxYg66b419O8hFi+vCLA$HZXq1dU0+KMUbOBxJjLaweXoHguJH+2AkPL1EurOOys'
If the AUTH_TOKEN is provided in the secrets, can authenticate automatically via URL parameter, e.g. {base_url}?auth_token=abc
.
Generate your own secure auth token.
First:
>>> from streamlit_app import auth_utils
>>> auth_token = auth_utils.generate_auth_token()
>>> auth_token
'21bbeab31ed2a3ebcd5563df65db64ba'
>>> auth_utils.passwd_hash(auth_token)
'argon2:$argon2id$v=19$m=10240,t=10,p=8$ECyPoKgu51axdr4rgktOCA$qmrYxYoSaVq4uCwm9W5csfltApcKBQRTHLdtQY9OVKI'
Second, put the generated token in secrets.toml
:
AUTH_TOKEN="argon2:$argon2id$v=19$m=10240,t=10,p=8$ECyPoKgu51axdr4rgktOCA$qmrYxYoSaVq4uCwm9W5csfltApcKBQRTHLdtQY9OVKI"
Third, you can then authenticate with {base_url}?auth_token=21bbeab31ed2a3ebcd5563df65db64ba
The standalone hint-generation app is hosted at src/hint_generation/app.py
.
The password-protected demo for the hint generation component is deployed at this URL: https://levi-math-hints.streamlit.app
The Streamlit demo draws on data from a few sources.
OpenStax is a Rice University project to make high-quality math textbooks available (under a CC BY 4.0 license). Access for free at https://openstax.org.
Rori is a project of Rising Academies. Micro-lessons help students learn at their level.
Math Nation is an interactive math tool for middle and high school math. Student questions asked in pre-algebra and algebra forums are a high-validity example of general QA questions asked by middle-school math students.
Math misconception data assembled by Nancy Otero. See this GitHub repository.