-
-
Notifications
You must be signed in to change notification settings - Fork 823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new config variable API_BASE_URL #477
Conversation
c17e7c5
to
85c204e
Compare
Works brilliantly for me running ollama in a docker container with I saw you mention somewhere you wanted were looking for people to test, so just my confirmation. Probably gonna try it out with the listener endpoints in the text-generation-web-ui as well and can comment on that here. Brilliant work. Been using sgpt for a while now and nice to slowly start moving to something fully locally hosted. :) |
That is quite amazing @hrfried! Thanks @TheR1D for putting this amazing piece of software together! My intension, and I'm in baby steps right now, just educating myself:
Sorry for digressing, but here seem to be likeminded folks. Regards, |
Currently just my main desktop which has a ryzen 9 7900x, nvidia 4060-Ti 16-gig, and 64 gigs of DDR5. I've gotten small models to run okay on less powerful hardware but they weren't really super performant. Mostly just using LLMs for productivity and exploring the space, training some LORAs on codebases for work to see what's feasible, etc. Nothing crazy really. Not super familiar with RAG tbh--but ollama is pretty simple to use. I hadn't used it until today when I saw it was possible to set it as an endpoint in shell_gpt. I normally use other methods (e.g. textgen-webui) for running local LLMS. Just a docker pull and a docker run, honestly. Don't know a whole lot about "true" cloud computing but I imagine you could run an nginx (or similar) reverse proxy into ollama with a docker-compose workflow and it'd be pretty simple, at least to set up a test-case. Not sure what kind of performance you'd get on shared servers though, especially if it's not GPU compute. Or if you're locked into the cloud provider's networking tools or anything. A little out of my wheelhouse, ha. |
This repo/Issues should probably not be polluted with OTs, so I'll just conclude here by posting a few pointers to the various topics touched upon.
That looks pretty good! Still, took a blind shot at installing the Ollama Docker on a 2x vCPU, 4GB RAM on Debian latest stable. Been researching a few other topics, and here are some pointers for everyone's benefit:
I'll probably stay away from cloud-computes, due to the prohibitive costs, especially as we scale towards production. Thanks @TheR1D and @hrfried for the great software and valuable inputs! |
API_BASE_URL
.OPENAI_BASE_URL
.show_messages
.