Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI update #273

Merged
merged 27 commits into from
Jan 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
4077048
Update langchain requirement from <0.3.0,>=0.2.5 to >=0.2.5,<0.4.0 (#…
dependabot[bot] Oct 16, 2024
cb6b70d
[Automated] Merge release into main (#235)
ProKil Oct 19, 2024
19d39e0
update macos test runner as macos-latest (#238)
ProKil Oct 21, 2024
7d2ab2f
feat(exp_eval): support tag for combo iteration (#245)
lwaekfjlk Nov 11, 2024
a1e025e
Adding OpenHands node
akhatua2 Nov 12, 2024
ae72e44
[autofix.ci] apply automated fixes
autofix-ci[bot] Nov 12, 2024
dfeaa82
Added LLM Agent Node
akhatua2 Nov 12, 2024
879da88
[autofix.ci] apply automated fixes
autofix-ci[bot] Nov 12, 2024
d15509d
removing openhands
akhatua2 Nov 13, 2024
96f8b6c
[autofix.ci] apply automated fixes
autofix-ci[bot] Nov 13, 2024
d72e21e
Correcting mypy error in tick agent
akhatua2 Nov 13, 2024
b5facf2
Moving everything to examples since its not specific to sotopia
akhatua2 Nov 15, 2024
558e582
[autofix.ci] apply automated fixes
autofix-ci[bot] Nov 15, 2024
f687c49
Adding some additional final checks
akhatua2 Nov 15, 2024
3a9e8a0
Renaming to interview openhands
akhatua2 Nov 15, 2024
7d51416
Adding documentation
akhatua2 Nov 16, 2024
8e0d9da
[autofix.ci] apply automated fixes
autofix-ci[bot] Nov 16, 2024
fa65167
Updating the readme
akhatua2 Nov 16, 2024
c2e3656
[autofix.ci] apply automated fixes
autofix-ci[bot] Nov 16, 2024
53e0798
Merge pull request #248 from sotopia-lab/openhands_integration
akhatua2 Nov 16, 2024
df95426
feat: Initial Message for LLM agnets in Sotopia Aact (Experimental) (…
akhatua2 Nov 16, 2024
8f98ad8
feat: Adding Chat print node for pretty printing chat conversation be…
akhatua2 Nov 16, 2024
07ee2c4
doc: API endpoints for sotopia (#242)
XuhuiZhou Nov 22, 2024
d55ec34
feat: FastAPI Implementation of Sotopia Part One (wo websocket) (#246)
XuhuiZhou Nov 26, 2024
dbd8294
update the instruction to load existing data via docker (#255)
XuhuiZhou Nov 26, 2024
61f190e
fix doc (#258)
XuhuiZhou Nov 30, 2024
d7724db
Sotopia API and UI (#264)
XuhuiZhou Jan 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/.codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ ignore:
- ".github" # ignore the .github directory
- "docs" # ignore the tests directory
- "figs" # ignore the figs directory
- "ui" # ignore the ui directory

coverage:
status:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/cli_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
strategy:
max-parallel: 5
matrix:
os: [ubuntu-latest, macos-13]
os: [ubuntu-latest, macos-latest]

runs-on: ${{ matrix.os }}

Expand All @@ -38,7 +38,7 @@ jobs:
run: |
python -m pip install --upgrade pip
python -m pip install uv
uv sync --extra test --extra chat
uv sync --extra test --extra api
- name: Test with pytest
run: |
uv run pytest tests/cli/test_install.py --cov=. --cov-report=xml
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/mypy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ jobs:
run: |
python -m pip install --upgrade pip
python -m pip install uv
uv sync --extra test --extra chat
uv sync --extra test --extra api
- name: Type-checking package with mypy
run: |
# Run this mypy instance against our main package.
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/tests.sh
Original file line number Diff line number Diff line change
@@ -1 +1 @@
uv run --extra test --extra chat pytest --ignore tests/cli --cov=. --cov-report=xml
uv run --extra test --extra api pytest --ignore tests/cli --cov=. --cov-report=xml
2 changes: 1 addition & 1 deletion .github/workflows/tests_in_docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
- name: Docker Compose
run: docker compose -f .devcontainer/docker-compose.yml up -d
- name: Run tests
run: docker compose -f .devcontainer/docker-compose.yml run --rm -u root -v /home/runner/work/sotopia/sotopia:/workspaces/sotopia devcontainer /bin/sh -c "cd /workspaces/sotopia; ls; uv sync --extra test --extra chat; uv run pytest --ignore tests/cli --cov=. --cov-report=xml"
run: docker compose -f .devcontainer/docker-compose.yml run --rm -u root -v /home/runner/work/sotopia/sotopia:/workspaces/sotopia devcontainer /bin/sh -c "cd /workspaces/sotopia; ls; uv sync --extra test --extra api; uv run pytest --ignore tests/cli --cov=. --cov-report=xml"
- name: Upload coverage report to Codecov
uses: codecov/[email protected]
with:
Expand Down
116 changes: 116 additions & 0 deletions docs/pages/concepts/evaluation_dimension.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
## Overview

Evaluation dimensions are used to evaluate the quality of social interactions.
In original Sotopia paper, there are 7 dimensions to evaluate the quality of social interactions, where we named them as `sotopia` evaluation dimensions:
- believability
- relationship
- knowledge
- secret
- social rules
- financial and material benefits
- goal

The `SotopiaDimensions` can be used directly without initializing the database. It provides a set of predefined evaluation dimensions that are ready to use for evaluating social interactions. For example,

```python
from sotopia.envs.parallel import ParallelSotopiaEnv
from sotopia.envs.evaluators import EvaluationForTwoAgents, ReachGoalLLMEvaluator, RuleBasedTerminatedEvaluator, SotopiaDimensions

env = ParallelSotopiaEnv(
env_profile=env_profile,
model_name=model_names["env"],
action_order="round-robin",
evaluators=[
RuleBasedTerminatedEvaluator(max_turn_number=20, max_stale_turn=2),
],
terminal_evaluators=[
ReachGoalLLMEvaluator(
model_names["env"],
EvaluationForTwoAgents[SotopiaDimensions], # type: ignore
# TODO check how to do type annotation
),
],
)
```


However we observe under many use cases people may want to evaluate with customized evaluation metrics, so we provide a way to build custom evaluation dimensions.
For a quick reference, you can directly check out the `examples/use_custom_dimensions.py`.

### CustomEvaluationDimension
The [`CustomEvaluationDimension`](/python_API/database/evaluation_dimensions) is a class that can be used to create a custom evaluation dimension.
There are four parameters:
- name: the name of the dimension
- description: the description of the dimension
- range_low: the minimum score of the dimension (should be an integer)
- range_high: the maximum score of the dimension (should be an integer)

### CustomEvaluationDimensionList
The [`CustomEvaluationDimensionList`](/python_API/database/evaluation_dimensions) is a class that can be used to create a custom evaluation dimension list based on the existing dimensions. It helps one to group multiple dimensions together for a specific use case.
There are two parameters:
- name: the name of the dimension list
- dimension_pks: the primary keys of the dimensions in the dimension list

### EvaluationDimensionBuilder
The [`EvaluationDimensionBuilder`](/python_API/database/evaluation_dimensions) is a class that can be used to generate a custom evaluation dimension model based on the existing dimensions.


## Usage
### Initialize the database
The default evaluation metric is still `SotopiaDimensions` in `sotopia.env.evaluators`.There is no `CustomEvaluationDimension` in the database by default. To initialize the database, please refer to `examples/use_custom_dimensions.py`.


### Use the custom evaluation dimensions
After you initialize your customized evaluation dimensions, you can choose to use any one of these methods provided below:

#### Method 1: Choose dimensions by names
```python
evaluation_dimensions = (
EvaluationDimensionBuilder.select_existing_dimension_model_by_name(
["transactivity", "verbal_equity"]
)
)
```

#### Method 2: Directly choose the grouped evaluation dimension list
```python
evaluation_dimensions = (
EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
"sotopia"
)
)
```

#### Method 3: Build a custom evaluation dimension model temporarily
We provide multiple ways to build a custom evaluation dimension model with `EvaluationDimensionBuilder`, specifically:
- `generate_dimension_model`: build an evaluation dimension from existing dimension primary keys.
- `generate_dimension_model_from_dict`: build an evaluation dimension from a dictionary that specifies the parameters of the `CustomEvaluationDimension`. For example
```json
[
{
"name": "believability",
"description": "The believability of the interaction",
"range_low": 0,
"range_high": 10
},
...
]
```
- `select_existing_dimension_model_by_name`: build an evaluation dimension from existing dimension names. For example `['believability', 'goal']`
- `select_existing_dimension_model_by_list_name`: build an evaluation dimension from existing `CustomEvaluationDimensionList` list names. For example, directly use `sotopia`.


After you get the evaluation dimension model, you can pass it as a parameter for the `Evaluator`, for example,
```python
evaluation_dimensions = (
EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
"sotopia"
)
)
terminal_evaluators=[
ReachGoalLLMEvaluator(
model_names["env"],
EvaluationForTwoAgents[evaluation_dimensions], # type: ignore
),
],
```
6 changes: 5 additions & 1 deletion docs/pages/concepts/generation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,8 @@ In this example, we generate a list of the first `n` prime numbers with the `gpt

Apart from using api endpoints from LLM providers like OpenAI, Together AI, Azure, etc., you can also use custom model with OpenAI compatible endpoints.
You will need to set the model name to `custom/<model_name>@url`, and CUSTOM_API_KEY to the API key of the custom model.
For an example, check out `examples/generation_api/custom_model.py`.

For example, if you want to use the `llama3.2` model for an agent from [Meta](https://www.meta.com/llama/), and you host the model on [LiteLLM](https://github.com/BerriAI/litellm) proxy server (e.g., Proxy running on `http://0.0.0.0:4000`). Then you can set the model name to `model_name="custom/llama3.2:1b@http:0.0.0.0:4000"`
to call the model in the [`LLMAgent`](/python_API/agents/llm_agent#llmagent).

For more information, check out `examples/generation_api/custom_model.py`.
2 changes: 1 addition & 1 deletion docs/pages/contribution/contribution.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ Please refer to [Dev Containers](https://containers.dev/supporting#editors) to s

You can also set up the development environment without Dev Containers. There are three things you will need to set up manually:

- Python and uv: Please start from an environment supporting Python 3.10+ and install uv using `pip install uv; uv sync --all-extra`.
- Python and uv: Please start from an environment supporting Python 3.10+ and install uv using `pip install uv; uv sync --all-extras`. (Note that this will install all the extra dependencies)
- Redis: Please refer to introduction page for the set up of Redis.
- Local LLM (optional): If you don't have access to model endpoints (e.g. OpenAI, Anthropic or others), you can use a local model. You can use Ollama, Llama.cpp, vLLM or many others which support OpenAI compatible endpoints.

Expand Down
6 changes: 6 additions & 0 deletions docs/pages/examples/deployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Deploy Sotopia Python API to Modal
We offer a script to deploy Sotopia Python API to [Modal](https://modal.com/).
To do so, simply go to the `sotopia/sotopia/ui` directory and run the following command:
```bash
modal deploy sotopia/ui/modal_api_server.py
```
11 changes: 8 additions & 3 deletions docs/pages/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ export REDIS_OM_URL="redis://localhost:6379"
```
if you are developing Sotopia using uv, you can sync your dependency with
```bash
uv sync --extra examples --extra chat
uv sync --extra examples --extra api
```
</AccordionContent>
</AccordionItem>
Expand All @@ -144,13 +144,18 @@ or manual setup:
<AccordionItem value="item-1">
<AccordionTrigger>Docker is my thing.</AccordionTrigger>
<AccordionContent>
Please follow the [instruction](https://redis.io/docs/stack/get-started/install/docker/) to start a redis-stack server or use an existing server. You can also check [Q&A](/docs/troubleshooting.md) to initiate the redis server with the Sotopia data.
Please follow the [instruction](https://redis.io/docs/stack/get-started/install/docker/) to start a redis-stack server or use an existing server. If you want to use the existing data in Sotopia, you can download the `dump.rdb` file from [here](https://cmu.box.com/shared/static/xiivc5z8rnmi1zr6vmk1ohxslylvynur). Feel free to check more datasets related to Sotopia [here](https://huggingface.co/collections/cmu-lti/sotopia-65f312c1bd04a8c4a9225e5b).

After downloading the `dump.rdb` file, make a `redis-data` folder in an desired `<your_path>` directory. And then you can start the server with the following command:
```bash
docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 -v <your_path>/redis-data:/data/ redis/redis-stack:latest
```

The `REDIS_OM_URL` need to be set before loading and saving agents:
```bash
conda env config vars set REDIS_OM_URL="redis://user:password@host:port"
```
</AccordionContent>
</AccordionContent>
</AccordionItem>
<AccordionItem value="item-2">
<AccordionTrigger>No, I don't want to use Docker.</AccordionTrigger>
Expand Down
54 changes: 54 additions & 0 deletions docs/pages/python_API/database/evaluation_dimensions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# `evaluation_dimensions.py`

This module provides classes and utilities for defining and managing custom evaluation dimensions within the Sotopia environment. It includes classes for individual dimensions, lists of dimensions, and a builder for creating dimension models.

## Classes

### `CustomEvaluationDimension`

Represents a custom evaluation dimension with specific attributes such as name, description, and score range.

#### Attributes
- `name`: `str`. The name of the dimension.
- `description`: `str`. A brief description of the dimension.
- `range_low`: `int`. The minimum score for the dimension.
- `range_high`: `int`. The maximum score for the dimension.

### `CustomEvaluationDimensionList`

Groups multiple custom evaluation dimensions together.

#### Attributes
- `name`: `str`. The name of the dimension list.
- `dimension_pks`: `list[str]`. A list of primary keys for the dimensions included in the list.

### `EvaluationDimensionBuilder`

Provides utility methods to create and manage evaluation dimension models.

#### Methods
- `create_range_validator(low: int, high: int)`: Creates a validator for score ranges.

**Arguments:**
- `low`: `int`. The minimum score allowed.
- `high`: `int`. The maximum score allowed.

- `build_dimension_model(dimension_ids: list[str])`: Builds a dimension model from primary keys.

**Arguments:**
- `dimension_ids`: `list[str]`. A list of dimension primary keys.

- `build_dimension_model_from_dict(dimensions: list[dict[str, Union[str, int]]])`: Builds a dimension model from a dictionary.

**Arguments:**
- `dimensions`: `list[dict[str, Union[str, int]]]`. A list of dictionaries specifying dimension attributes.

- `select_existing_dimension_model_by_name(dimension_names: list[str])`: Selects a dimension model by dimension names.

**Arguments:**
- `dimension_names`: `list[str]`. A list of dimension names.

- `select_existing_dimension_model_by_list_name(list_name: str)`: Selects a dimension model by list name.

**Arguments:**
- `list_name`: `str`. The name of the dimension list.
30 changes: 27 additions & 3 deletions examples/experiment_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
EnvAgentComboStorage,
EnvironmentProfile,
EpisodeLog,
EvaluationDimensionBuilder,
)
from sotopia.envs.evaluators import (
EvaluationForTwoAgents,
Expand All @@ -34,6 +35,7 @@
)
from sotopia.server import run_async_server
from sotopia_conf.gin_utils import parse_gin_flags, run
# from sotopia.database import EvaluationDimensionBuilder

_DEFAULT_GIN_SEARCH_PATHS = [
os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
Expand Down Expand Up @@ -109,6 +111,18 @@ def _iterate_env_agent_combo_not_in_db(
tag: str | None = None,
) -> Generator[EnvAgentCombo[Observation, AgentAction], None, None]:
"""We iterate over each environment and return the **first** env-agent combo that is not in the database."""
# loading evaluation metric
try:
evaluation_dimensions = EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
"sotopia"
) # Initialize your customized dimension, please refer to `examples/use_custom_dimensions.py`
except Exception as e:
print(
"No customized evaluation dimensions found, using default SotopiaDimensions",
e,
)
evaluation_dimensions = SotopiaDimensions

if not env_ids:
env_ids = list(EnvironmentProfile.all_pks())
for env_id in env_ids:
Expand All @@ -123,6 +137,11 @@ def _iterate_env_agent_combo_not_in_db(
)
assert env_agent_combo_storage_list
first_env_agent_combo_storage_to_run: EnvAgentComboStorage | None = None

env_agent_combo_storage_list = sorted(
env_agent_combo_storage_list, key=lambda x: str(x.pk)
)

for env_agent_combo_storage in env_agent_combo_storage_list:
env_agent_combo_storage = cast(
EnvAgentComboStorage, env_agent_combo_storage
Expand All @@ -147,7 +166,8 @@ def _iterate_env_agent_combo_not_in_db(
terminal_evaluators=[
ReachGoalLLMEvaluator(
model_names["env"],
EvaluationForTwoAgents[SotopiaDimensions],
EvaluationForTwoAgents[evaluation_dimensions], # type: ignore
# TODO check how to do type annotation
),
],
)
Expand Down Expand Up @@ -183,10 +203,14 @@ def run_async_server_in_batch(
logger.removeHandler(rich_handler)

# we cannot get the exact length of the generator, we just give an estimate of the length
env_agent_combo_iter = _iterate_env_agent_combo_not_in_db(model_names=model_names)
env_agent_combo_iter = _iterate_env_agent_combo_not_in_db(
model_names=model_names, tag=tag
)
env_agent_combo_iter_length = sum(1 for _ in env_agent_combo_iter)

env_agent_combo_iter = _iterate_env_agent_combo_not_in_db(model_names=model_names)
env_agent_combo_iter = _iterate_env_agent_combo_not_in_db(
model_names=model_names, tag=tag
)
env_agent_combo_batch: list[EnvAgentCombo[Observation, AgentAction]] = []

while True:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from aact import Message, NodeFactory
from aact.messages import Text, Tick, DataModel, DataModelFactory
from sotopia.agents.llm_agent import ainput
from sotopia.experimental.agents import BaseAgent
from sotopia.experimental.agents.base_agent import BaseAgent

from sotopia.generation_utils import agenerate
from sotopia.generation_utils.generate import StrOutputParser
Expand Down
Loading
Loading