Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in AzureAISearch Vector Store: user_id Filter Not Working Correctly #2170

Open
junmo1215 opened this issue Jan 22, 2025 · 1 comment · May be fixed by #2171
Open

Bug in AzureAISearch Vector Store: user_id Filter Not Working Correctly #2170

junmo1215 opened this issue Jan 22, 2025 · 1 comment · May be fixed by #2171

Comments

@junmo1215
Copy link

🐛 Describe the bug

When configuring azure_ai_search as the vector_store in the mem0 project, updating a user's memory via m.add(xxx, user_id="") does not respect the user_id filter. Specifically, an update intended for one user inadvertently modifies another user's memory.

Sample code:

from pprint import pprint
from mem0 import Memory

m = Memory.from_config(config_dict={
    "version": "v1.1",
    "vector_store": {
        "provider": "azure_ai_search",
        "config": {
            "service_name": "",
            "api_key": "",
            "collection_name": "", 
            "embedding_model_dims": 3072,
            "use_compression": False
        }
    }
})

# Add memory for Alice
m_alice = m.add("my name is Alice", user_id="Alice")
print("m_alice: ", m_alice)

# Add memory for Bob
m_bob = m.add("my name is Bob", user_id="Bob")
print("m_bob: ", m_bob)

# Wait for the operations to complete
time.sleep(10)

# Retrieve all memories
print("Final:")
pprint(m.get_all())

Output

m_alice:  {'results': [{'id': '94a4ce36-84bd-4c77-80da-3ed62bdde35c', 'memory': 'Name is Alice', 'event': 'ADD'}], 'relations': []}
m_bob:  {'results': [{'id': '94a4ce36-84bd-4c77-80da-3ed62bdde35c', 'memory': 'Name is Bob', 'event': 'UPDATE', 'previous_memory': 'Name is Alice'}], 'relations': []}
Final:
{'results': [{'created_at': '2025-01-22T00:19:10.999547-08:00',
              'hash': '2c6f48df7e8d4ea366914773ca57b8b4',
              'id': '94a4ce36-84bd-4c77-80da-3ed62bdde35c',
              'memory': 'Name is Bob',
              'metadata': None,
              'updated_at': '2025-01-22T00:19:13.253396-08:00',
              'user_id': 'Alice'}]}

Issue Details:

As shown in the output:

  • Adding memory for Alice works as expected.
  • When adding memory for Bob, instead of creating a new memory entry, it updates Alice's existing memory.
  • The event for Bob's operation is 'UPDATE' with 'previous_memory': 'Name is Alice'.
  • In the final output, there's only one memory entry with user_id: 'Alice', but the memory content is 'Name is Bob'.

Expected Behavior:

Each user should have their own separate memory entries. Adding a memory for Bob should create a new entry associated with user_id: 'Bob', without affecting Alice's memory.

Actual Behavior:

Bob's memory addition updates Alice's existing memory instead of creating a new one. This indicates that the user_id filter is not functioning properly in the azure_ai_search vector store implementation, leading to cross-user data contamination.

Additional Information

Image

@junmo1215
Copy link
Author

After looking into the implementation of mem0/vector_stores/azure_ai_search.py, I noticed that the index is created with only three fields: id, vector, and payload. The current approach fetches all documents and then filters them afterward.

def search(self, query, limit=5, filters=None):
    vector_query = VectorizedQuery(vector=query, k_nearest_neighbors=limit, fields="vector")     
    search_results = self.search_client.search(vector_queries=[vector_query], top=limit)

    results = []
    for result in search_results:
        payload = json.loads(result["payload"])
        if filters:
            for key, value in filters.items():
                if key not in payload or payload[key] != value:
                    continue
        results.append(OutputData(id=result["id"], score=result["@search.score"], payload=payload))
    return results

This method has a couple of issues:

  1. Filtering Order:
  • Applying a limit before filtering could exclude relevant documents that should be included after filtering.
  1. Ineffective Filtering:
  • The filtering logic doesn't seem to work as intended, so documents aren't correctly filtered by user_id.

These issues might be causing the problem where one user's memory updates another's memory. I'll submit a pull request soon to modify the implementation and address these concerns.

junmo1215 added a commit to junmo1215/mem0 that referenced this issue Jan 22, 2025
@junmo1215 junmo1215 linked a pull request Jan 22, 2025 that will close this issue
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant