Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step by step instructions to install and run the Llama Stack on Linux and Mac #40

Open
jeffxtang opened this issue Aug 9, 2024 · 15 comments
Labels

Comments

@jeffxtang
Copy link

jeffxtang commented Aug 9, 2024

I managed to make the Llama Stack server and client work with Ollama on both EC2 (with 24GB GPU) and Mac (tested on 2021 M1 and 2019 2.4GHz i9 MBP, both with 32GB memory). Steps are below:

  1. Open one Terminal, go to your work directory, then:
git clone  https://github.com/meta-llama/llama-agentic-system
cd llama-agentic-system
conda create -n llama-stack python=3.10
conda activate llama-stack
pip install -r requirements.txt
  1. If you're on Linux, run:
curl -fsSL https://ollama.com/install.sh | sh

Otherwise, download the Ollama zip for Mac here, unzip it and double click the Ollama.app to move it to the Applications folder.

  1. On the same Terminal, run:
ollama pull llama3.1:8b-instruct-fp16

to download the Llama 3.1 8B model and then run:

ollama run llama3.1:8b-instruct-fp16

to confirm it works by entering some question and expecting Llama's answer.

  1. Now run the command below to install Llama Stack's Ollama distribution:
llama distribution install --spec local-ollama --name ollama

You should see (and hit enter to accept default settings for Configuring..., except n & n for the two questions related to llama_guard_shield & prompt_guard_shield):

Successfully setup distribution environment. Configuring...
Configuring API surface: inference
Enter value for url (default: http://localhost:11434):

Configuring API surface: safety
Do you want to configure llama_guard_shield? (y/n): n
Do you want to configure prompt_guard_shield? (y/n): n

Configuring API surface: agentic_system

YAML configuration has been written to /Users/<your_name>/.llama/distributions/ollama/config.yaml
Distribution ollama (with spec local-ollama) has been installed successfully!

  1. Launch the ollama distribution by running:
llama distribution start --name ollama --port 5000
  1. Finally on another Terminal, go to the llama-agentic-system folder, then:
conda activate ollama

and either (on Mac)

python examples/scripts/vacation.py localhost 5000 --disable_safety

or (on Linux)

python examples/scripts/vacation.py [::] 5000 --disable_safety

You should see output starting with (Note: If you start the script right after Step 5, especially on a slower machine such as 2019 Mac with 2.4GHz i9, you may see "httpcore.ReadTimeout" because the Llama model is still being loaded; wait a moment and retry (a few times) should work):

User> I am planning a trip to Switzerland, what are the top 3 places to visit?
StepType.inference> Switzerland is a beautiful country with a rich history, stunning landscapes, and vibrant culture. Here are three top places to visit in Switzerland:

  1. Jungfraujoch: Also known as the "Top of Europe," Jungfraujoch is the highest train station in Europe, located at an altitude of 3,454 meters (11,332 feet) above sea level. It offers breathtaking views of the surrounding mountains and glaciers, including the iconic Eiger, Mönch, and Jungfrau peaks.

and on the first Terminal that runs llama distribution start --name ollama --port 5000, you should see:

INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
Environment: ipython
Tools: brave_search, wolfram_alpha, photogen

Cutting Knowledge Date: December 2023
Today Date: 09 August 2024

INFO: ::1:50987 - "POST /agentic_system/create HTTP/1.1" 200 OK
INFO: ::1:50988 - "POST /agentic_system/session/create HTTP/1.1" 200 OK
INFO: ::1:50989 - "POST /agentic_system/turn/create HTTP/1.1" 200 OK
role='user' content='I am planning a trip to Switzerland, what are the top 3 places to visit?'
Pulling model: llama3.1:8b-instruct-fp16
Assistant: Switzerland is a beautiful country with a rich history, stunning landscapes, and vibrant culture. Here are three top places to visit in Switzerland:

  1. Jungfraujoch: Also known as the "Top of Europe," Jungfraujoch is a mountain peak located in the Bernese Alps. It's the highest train station in Europe, offering breathtaking views of the surrounding mountains, glaciers, and valleys. You can take a ride on the Jungfrau Railway, which takes you to the summit, where you can enjoy stunning vistas, visit the Ice Palace, and even ski or snowboard in the winter.

Bonus: To see the tool calling (see here and here for more info) in action, try the hello.py example, which asks Llama "Which players played in the winning team of the NBA western conference semifinals of 2024, please use tools" whose answer needs a web search tool, followed by a prompt "Hello". On Mac, run (replace localhost with [::] on Linux):

python examples/scripts/hello.py localhost 5000 --disable_safety         

And you should see the output returning "BuiltinTool.brave_search" below (if you see "httpcore.ReadTimeout", retry should work):

User> Hello
StepType.inference> Hello! How can I assist you today?
User> Which players played in the winning team of the NBA western conference semifinals of 2024, please use tools
StepType.inference> brave_search.call(query="NBA Western Conference Semifinals 2024 winning team players")
StepType.tool_execution> Tool:BuiltinTool.brave_search Args:{'query': 'NBA Western Conference Semifinals 2024 winning team players'}
StepType.tool_execution> Tool:BuiltinTool.brave_search Response:{"query": null, "top_k": []}
StepType.shield_call> No Violation
StepType.inference> I need to search for information about the 2024 NBA Western Conference Semifinals.

If you delete "please use tools" in the prompt of hello.py, not wanting to beg, you'll likely see the output:

I'm not able to provide real-time information. However, I can suggest some possible sources where you may be able to find the information you are looking for.

By setting an appropriate system prompt, or switching to a bigger sized Llama 3.1 model - details coming soon - you'd see you don't have to be too polite to make Llama comfortable but yourself not.

@jeffxtang jeffxtang changed the title Step by step instructions to install and run the Llama Stack on Linux and Mac Quick guide to install and run the Llama Stack on Linux and Mac Aug 9, 2024
@jeffxtang jeffxtang changed the title Quick guide to install and run the Llama Stack on Linux and Mac Step by step instructions to install and run the Llama Stack on Linux and Mac Aug 10, 2024
@amkoupaei
Copy link

i have ubuntu - step 4 gives the below error; any help is greatly appreciated

image

@jeffxtang
Copy link
Author

Your error message says "Conda environment 'ollama' exists". Did you run Step 4 more than once? What does "conda env list|grep ollama" show? Can you try "llama distribution install --spec local-ollama --name ollama2" assuming "ollama2" doesn't exist then use "ollama2" instead of "ollama" in Steps 5 and 6.

@amkoupaei
Copy link

amkoupaei commented Aug 12, 2024

conda env list|grep ollama gives
image

llama distribution install --spec local-ollama --name ollama2 gives the same error as the original screenshot

@dltn
Copy link
Contributor

dltn commented Aug 12, 2024

I see PS1: unbound variable (install_distribution.sh sets -e), so I suspect that there's an issue with the prompt when the script attempts to activate the environment. @amkoupaei, are you able to create/use other conda environments successfully? Also, any reason you need to run as root?

@amkoupaei
Copy link

Noted - thank you.

I can create other conda envs successfully. Also no need for root; I just tried that route for debugging this issue. Running as non-root has the same issue

@hardikjshah
Copy link
Contributor

@amkoupaei Dont have hands on an unbuntu machine to try this right now but some early debugging seems like if we update line 111 in install_distribution.sh to

python_interp=$(conda run --no-capture-output -n "$env_name" which python)

This might fix the issue for you. Can you give this a try and see if this fixes it for you ?

@amkoupaei
Copy link

Unfortunately, it did not work either.
I also tried this on a fresh Ubuntu EC2 instance; still the same issue

@jeffxtang
Copy link
Author

I just tried on a fresh EC2 too and it worked for me - the complete log of "llama distribution" is here. What's your log or diff look like? @amkoupaei

@amkoupaei
Copy link

here is the logs:

logs.log

@ashwinb
Copy link
Contributor

ashwinb commented Aug 15, 2024

Really odd. Can you run conda run -n agentic_env which python in your shell and paste what it outputs? Does it succeed?

@ashwinb
Copy link
Contributor

ashwinb commented Aug 15, 2024

I simplified a bit: meta-llama/llama-stack@0d933ac

Can you see if this helps?

@amkoupaei
Copy link

yes, that succeeds - giving the location of the python installation.
I might consider an alternative path and use the already deployed models on cloud.

Thank you all for your help/support.

@ashwinb ashwinb closed this as completed Aug 15, 2024
@ashwinb ashwinb reopened this Aug 15, 2024
@ashwinb
Copy link
Contributor

ashwinb commented Aug 15, 2024

@hardikjshah @dltn we need to host these instructions (these are great!) somewhere in our READMEs or instructions for Ollama. What would be the right place?

@HabebNawatha
Copy link
Contributor

Running the command :
llama distribution install --spec local-ollama --name ollama
Getting this output:
usage: llama [-h] {download,model,stack} ...
llama: error: argument {download,model,stack}: invalid choice: 'distribution' (choose from 'download', 'model', 'stack')
I'm new here trying to run llama using Mac.
distribution doesn't seem to be an argument in llama.
Help would be appreciated.

@heyjustinai
Copy link
Member

hi @HabebNawatha, please try the quick start guide here to run llama stack with mac.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants