Merge branch 'multi-turn-v1.3' of https://github.com/minmin-intel/Gen…

…AIComps into multi-turn-v1.3
opea-project · Jan 31, 2025 · 17a8205 · 17a8205
2 parents 1a59e13 + cea91ac
commit 17a8205
Show file tree

Hide file tree

Showing 106 changed files with 1,410 additions and 1,149 deletions.
diff --git a/.github/workflows/docker/compose/animation-compose.yaml b/.github/workflows/docker/compose/animation-compose.yaml
@@ -7,11 +7,3 @@ services:
     build:
       dockerfile: comps/animation/src/Dockerfile
     image: ${REGISTRY:-opea}/animation:${TAG:-latest}
-  wav2lip:
-    build:
-      dockerfile: comps/third_parties/wav2lip/src/Dockerfile
-    image: ${REGISTRY:-opea}/wav2lip:${TAG:-latest}
-  wav2lip-gaudi:
-    build:
-      dockerfile: comps/third_parties/wav2lip/src/Dockerfile.intel_hpu
-    image: ${REGISTRY:-opea}/wav2lip-gaudi:${TAG:-latest}
diff --git a/.github/workflows/pr-helm-test.yaml b/.github/workflows/pr-helm-test.yaml
@@ -63,7 +63,7 @@ jobs:
 
   Chart-test:
     needs: [job1]
-    if: always() && ${{ needs.job1.outputs.run_matrix.service.length }} > 0
+    if: always() && ${{ fromJSON(needs.job1.outputs.run_matrix).length != 0 }}
     uses: ./.github/workflows/_run-helm-chart.yml
     strategy:
       matrix: ${{ fromJSON(needs.job1.outputs.run_matrix) }}

diff --git a/.github/workflows/push-image-build.yml b/.github/workflows/push-image-build.yml
@@ -32,22 +32,28 @@ jobs:
       - name: Get Test Services
         id: get-services
         run: |
+          set -x
           base_commit=$(git rev-parse HEAD~1)
           merged_commit=$(git log -1 --format='%H')
           # git diff --name-only ${base_commit} ${merged_commit} | grep -E "cores|comps/__init__.py" | grep -Ev ".md"
-          # if [ $? -eq 0 ]; then
+
           if git diff --name-only ${base_commit} ${merged_commit} | grep -E "cores|comps/__init__.py" | grep -Ev ".md"; then
             echo "ALL image build!!!"
             services=$(basename -a .github/workflows/docker/compose/*-compose.yaml | sed 's/-compose.yaml//' | jq -R '.' )
           else
             changed_src="$(git diff --name-only ${base_commit} ${merged_commit} | grep 'src/' | grep -vE '\.md')" || true
             changed_yamls="$(git diff --name-only ${base_commit} ${merged_commit} | grep '.github/workflows/docker/compose/')" || true
-            services=$(printf '%s\n' "${changed_src[@]}" | cut -d'/' -f2 | grep -vE '\.py' | sort -u | jq -R '.' ) || true
-            while IFS= read -r line; do
-              filename=$(basename "$line" -compose.yaml)
-              echo "$line $(printf '%s\n' "$filename" | jq -R '.' )"
-              services+=" $(printf '%s\n' "$filename" | jq -R '.' )" || true
-            done <<< "$changed_yamls"
+            [[ -n "$changed_src" ]] && services=$(printf '%s\n' "${changed_src[@]}" | cut -d'/' -f2 | grep -vE '\.py' | sort -u | jq -R '.' ) || true
+
+            if [[ -n "$changed_yamls" ]]; then
+              while IFS= read -r line; do
+                filename=$(basename "$line" -compose.yaml)
+                echo "$line $(printf '%s\n' "$filename" | jq -R '.' )"
+                services+=" $(printf '%s\n' "$filename" | jq -R '.' )" || true
+              done <<< "$changed_yamls"
+            else
+              echo "No changes in YAML files."
+            fi
           fi
 
           echo "services=$(echo "$services" | jq -sc 'unique | sort')"
@@ -56,6 +62,7 @@ jobs:
 
   image-build:
     needs: get-build-matrix
+    if: ${{ fromJSON(needs.get-build-matrix.outputs.services).length != 0 }}
     strategy:
       matrix:
         service: ${{ fromJSON(needs.get-build-matrix.outputs.services) }}
@@ -65,6 +72,7 @@ jobs:
     steps:
       - name: Clean up Working Directory
         run: |
+          echo "matrix.service=${{ matrix.service }}"
           sudo rm -rf ${{github.workspace}}/*
 
       - name: Checkout out Repo

diff --git a/comps/agent/src/Dockerfile b/comps/agent/src/Dockerfile
@@ -15,8 +15,6 @@ RUN useradd -m -s /bin/bash user && \
     mkdir -p /home/user && \
     chown -R user /home/user/
 
-USER user
-
 COPY comps /home/user/comps
 
 RUN pip install --no-cache-dir --upgrade pip setuptools && \
@@ -28,8 +26,6 @@ RUN pip install --no-cache-dir --upgrade pip setuptools && \
 
 ENV PYTHONPATH=/home/user
 
-USER root
-
 RUN mkdir -p /home/user/comps/agent/src/status && chown -R user /home/user/comps/agent/src/status
 
 USER user

diff --git a/comps/agent/src/README.md b/comps/agent/src/README.md
@@ -22,19 +22,20 @@ We currently support the following types of agents. Please refer to the example
 ### 1.2 LLM engine
 
 Agents use LLM for reasoning and planning. We support 2 options of LLM engine:
+
 1. Open-source LLMs served with vllm. Follow the instructions in [Section 2.2](#22-start-agent-microservices-with-vllm).
 2. OpenAI LLMs via API calls. To use OpenAI llms, specify `llm_engine=openai` and `export OPENAI_API_KEY=<your-openai-key>`
 
-| Agent type       | `strategy` arg    | Validated LLMs (serving SW)                                                                                  | Notes                                                                                                                                                                                                                                                                          | Example config yaml                                               |
-| ---------------- | ----------------- | ------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------- |
-| ReAct            | `react_langchain` | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (vllm-gaudi)   | Only allows tools with one input variable                                                                                                                                                                                                                                      | [react_langchain yaml](../../../tests/agent/react_langchain.yaml) |
-| ReAct            | `react_langgraph` | GPT-4o-mini, [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (vllm-gaudi) | if using vllm, need to specify `--enable-auto-tool-choice --tool-call-parser ${model_parser}`, refer to vllm docs for more info, only one tool call in each LLM output due to the limitations of llama3.1 modal and vllm tool call parser.                                                                                                                                                | [react_langgraph yaml](../../../tests/agent/react_vllm.yaml)      |
-| ReAct            | `react_llama`     | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct), [llama3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)(vllm-gaudi)   | Recommended for open-source LLMs, supports multiple tools and parallel tool calls.                                                                                                                                                                                             | [react_llama yaml](../../../tests/agent/reactllama.yaml)          |
-| RAG agent        | `rag_agent`       | GPT-4o-mini                                                                                                  |                                                                                                                                                                                                                                                                                | [rag_agent yaml](../../../tests/agent/ragagent_openai.yaml)       |
-| RAG agent        | `rag_agent_llama` | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct), [llama3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) (vllm-gaudi)   | Recommended for open-source LLMs, only allows 1 tool with input variable to be "query"                                                                                                                                                                                         | [rag_agent_llama yaml](../../../tests/agent/ragagent.yaml)        |
-| Plan and execute | `plan_execute`    | GPT-4o-mini, [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (vllm-gaudi)  | use `--guided-decoding-backend lm-format-enforcer` when launching vllm.                                                                                                                                                                                                        | [plan_execute yaml](../../../tests/agent/planexec_openai.yaml)    |
-| SQL agent        | `sql_agent_llama` | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct), [llama3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) (vllm-gaudi)               | database query tool is natively integrated using Langchain's [QuerySQLDataBaseTool](https://python.langchain.com/api_reference/community/tools/langchain_community.tools.sql_database.tool.QuerySQLDatabaseTool.html). User can also register their own tools with this agent. | [sql_agent_llama yaml](../../../tests/agent/sql_agent_llama.yaml) |
-| SQL agent        | `sql_agent`       | GPT-4o-mini                                                                                                  | database query tool is natively integrated using Langchain's [QuerySQLDataBaseTool](https://python.langchain.com/api_reference/community/tools/langchain_community.tools.sql_database.tool.QuerySQLDatabaseTool.html). User can also register their own tools with this agent. | [sql_agent yaml](../../../tests/agent/sql_agent_openai.yaml)      |
+| Agent type       | `strategy` arg    | Validated LLMs (serving SW)                                                                                                                                                       | Notes                                                                                                                                                                                                                                                                          | Example config yaml                                               |
+| ---------------- | ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------- |
+| ReAct            | `react_langchain` | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (vllm-gaudi)                                                                                    | Only allows tools with one input variable                                                                                                                                                                                                                                      | [react_langchain yaml](../../../tests/agent/react_langchain.yaml) |
+| ReAct            | `react_langgraph` | GPT-4o-mini, [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (vllm-gaudi)                                                                       | if using vllm, need to specify `--enable-auto-tool-choice --tool-call-parser ${model_parser}`, refer to vllm docs for more info, only one tool call in each LLM output due to the limitations of llama3.1 modal and vllm tool call parser.                                     | [react_langgraph yaml](../../../tests/agent/react_vllm.yaml)      |
+| ReAct            | `react_llama`     | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct), [llama3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)(vllm-gaudi)  | Recommended for open-source LLMs, supports multiple tools and parallel tool calls.                                                                                                                                                                                             | [react_llama yaml](../../../tests/agent/reactllama.yaml)          |
+| RAG agent        | `rag_agent`       | GPT-4o-mini                                                                                                                                                                       |                                                                                                                                                                                                                                                                                | [rag_agent yaml](../../../tests/agent/ragagent_openai.yaml)       |
+| RAG agent        | `rag_agent_llama` | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct), [llama3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) (vllm-gaudi) | Recommended for open-source LLMs, only allows 1 tool with input variable to be "query"                                                                                                                                                                                         | [rag_agent_llama yaml](../../../tests/agent/ragagent.yaml)        |
+| Plan and execute | `plan_execute`    | GPT-4o-mini, [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (vllm-gaudi)                                                                       | use `--guided-decoding-backend lm-format-enforcer` when launching vllm.                                                                                                                                                                                                        | [plan_execute yaml](../../../tests/agent/planexec_openai.yaml)    |
+| SQL agent        | `sql_agent_llama` | [llama3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct), [llama3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) (vllm-gaudi) | database query tool is natively integrated using Langchain's [QuerySQLDataBaseTool](https://python.langchain.com/api_reference/community/tools/langchain_community.tools.sql_database.tool.QuerySQLDatabaseTool.html). User can also register their own tools with this agent. | [sql_agent_llama yaml](../../../tests/agent/sql_agent_llama.yaml) |
+| SQL agent        | `sql_agent`       | GPT-4o-mini                                                                                                                                                                       | database query tool is natively integrated using Langchain's [QuerySQLDataBaseTool](https://python.langchain.com/api_reference/community/tools/langchain_community.tools.sql_database.tool.QuerySQLDatabaseTool.html). User can also register their own tools with this agent. | [sql_agent yaml](../../../tests/agent/sql_agent_openai.yaml)      |
 
 ### 1.3 Tools
 
@@ -47,63 +48,68 @@ The tools are registered with a yaml file. We support the following types of too
 Examples of how to register tools can be found in [Section 4](#-4-provide-your-own-tools) below.
 
 ### 1.4 Agent APIs
+
 We support two sets of APIs that are OpenAI compatible:
 
-1. OpenAI compatible chat completions API. Example useage with Python code below.
+1. OpenAI compatible chat completions API. Example usage with Python code below.
+
 ```python
 url = f"http://{ip_address}:{agent_port}/v1/chat/completions"
 
 # single-turn, not streaming -> if agent is used as a worker agent (i.e., tool for supervisor agent)
-payload = {
-        "messages": query,
-        "stream":false
-    }
+payload = {"messages": query, "stream": false}
 resp = requests.post(url=url, json=payload, proxies=proxies, stream=False)
 
 # multi-turn, streaming -> to interface with users
-query = {"role": "user", "messages": user_message, "thread_id":thread_id, "stream":stream}
+query = {"role": "user", "messages": user_message, "thread_id": thread_id, "stream": stream}
 content = json.dumps(query)
 resp = requests.post(url=url, data=content, proxies=proxies, stream=True)
 for line in resp.iter_lines(decode_unicode=True):
     print(line)
 ```
-2. OpenAI compatible assistants APIs. 
 
-    See example Python code [here](./test_assistant_api.py). There are 4 steps:
+2. OpenAI compatible assistants APIs.
 
-    Step 1. create an assistant: /v1/assistants
+   See example Python code [here](./test_assistant_api.py). There are 4 steps:
 
-    Step 2. create a thread: /v1/threads
+   Step 1. create an assistant: /v1/assistants
 
-    Step 3. send a message to the thread: /v1/threads/{thread_id}/messages
+   Step 2. create a thread: /v1/threads
 
-    Step 4. run the assistant: /v1/threads/{thread_id}/runs
+   Step 3. send a message to the thread: /v1/threads/{thread_id}/messages
 
+   Step 4. run the assistant: /v1/threads/{thread_id}/runs
+
+**Note**:
 
-**Note**: 
 1. Currently only `reract_llama` agent is enabled for assistants APIs.
 2. Not all keywords of OpenAI APIs are supported yet.
 
 ### 1.5 Agent memory
-We currently supports two types of memory. 
+
+We currently supports two types of memory.
+
 1. `volatile`: agent memory stored in RAM, so is volatile, the memory contains agent states within a thread. Used to enable multi-turn conversations between the user and the agent. Both chat completions API and assistants APIs support this type of memory.
 2. `persistent`: agent memory stored in Redis database, contains agent states in all threads. Only assistants APIs support this type of memory. Used to enable multi-turn conversations. In future we will explore algorithms to take advantage of the info contained in previous conversations to improve agent's performance.
 
 **Note**: Currently only `react_llama` agent supports memory and multi-turn conversations.
 
 #### How to enable agent memory?
+
 Specify `with_memory`=True. If want to use persistent memory, specify `memory_type`=`persistent`, and you need to launch a Redis database using the command below.
+
 ```bash
 # you can change the port from 6379 to another one.
 docker run -d -it -p 6379:6379 --rm --name "test-persistent-redis" --net=host --ipc=host --name redis-vector-db redis/redis-stack:7.2.0-v9
 ```
+
 Examples of python code for multi-turn conversations using agent memory:
+
 1. [chat completions API with volatile memory](./test_chat_completion_multiturn.py)
 2. [assistants APIs with persistent memory](./test_assistant_api.py)
 
 To run the two examples above, first launch the agent microservice using [this docker compose yaml](../../../tests/agent/reactllama.yaml).
 
-
 ## 🚀2. Start Agent Microservice
 
 ### 2.1 Build docker image for agent microservice
@@ -152,12 +158,14 @@ docker logs comps-agent-endpoint
 Once microservice starts, user can use below script to invoke.
 
 ### 3.1 Use chat completions API
+
 For multi-turn conversations, first specify a `thread_id`.
+
 ```bash
 export thread_id=<thread-id>
 curl http://${ip_address}:9090/v1/chat/completions -X POST -H "Content-Type: application/json" -d '{
      "messages": "What is OPEA project?",
-     "thread_id":${thread_id}, 
+     "thread_id":${thread_id},
      "stream":true
     }'