Use ~900 turns of chats from the ShareGPT dataset to evaluate Memobase
- Selected the longest chats from the ShareGPT dataset (
sg_90k_part1.json
)- ID "7uOhOjo". The chats can be found in:
./sharegpt_test_7uOhOjo.json
- ID "7uOhOjo". The chats can be found in:
- Ensure you have set up the Memobase Backend
- Run
pip install memobase rich
- We use OpenAI gpt-4o-mini as the default model. Make sure you have an OpenAI key and add it to
config.yaml
- Run
python run.py
(this will take some time) based on the Quickstart - Memobase. - For comparison, we also tested against mem0 (version 0.1.2), another great memory layer solution. The code is in
./run_mem0.py
, also using gpt-4o-mini as the default model.- Feel free to raise issues about
run_mem0.py
. We wrote this script based on the quickstart and it may not follow best practices. However, we kept the Memobase process as basic as possible for fair comparison.
- Feel free to raise issues about
- To simulate real-world usage, we combine each user+assistant exchange as a single turn when inserting into both Memobase and Mem0.
- Using
tiktoken
to count tokens (modelgpt-4o
) - Total tokens in Raw Messages: 63,736
- Estimated costs:
- Input tokens: ~220,000
- Output tokens: ~15,000
- Based on OpenAI's Dashboard, 900 turns of chat will cost approximately $0.042 (LLM costs)
- Complete insertion takes 270-300 seconds (averaged over 3 tests)
- Based on OpenAI's Dashboard, 900 turns of chat will cost approximately $0.24 (LLM) + <$0.01 (embedding)
- Complete insertion takes 1,683 seconds (single test)
- Mem0 uses hot-path updates, meaning each update triggers a memory flush. When using Mem0's
Memory.add
, you need to manually manage data insertion to avoid frequent memory flushes. Memobase includes a buffer zone to handle this automatically.- This results in Mem0 making more LLM calls than Memobase, leading to higher costs and longer processing times.
- Additionally, Mem0 computes embeddings for each memory and retrieves them on every insertion, while Memobase doesn't use embeddings for user memory. Instead, we use dynamic profiling to generate primary and secondary indices for users, retrieving memories using SQL queries only.
User profile is below (mask sensitive information as **):
* basic_info: language_spoken - User uses both English and Korean.
* basic_info: name - 오*영
* contact_info: email - s****2@cafe24corp.com
* demographics: marital_status - user is married
* education: - User had an English teacher who emphasized capitalization...
You can view the full profile in here
Take a look at a more structured profiles:
[
UserProfile(
topic='demographics',
sub_topic='marital_status',
content='user is married'
...
)
...
]
We list some of the memories below(Memory.get_all
):
- The restaurant is awesome
- User is interested in the lyrics of 'Home Sweet Home' by Motley Crue
- In Korea, people use '^^' to express smile
- Reservation for a birthday party on March 22
- Did not decide the menu...
The full results is in here.