Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed concurrent problems for document indexing and user query #693

Merged
merged 35 commits into from
Feb 2, 2025

Conversation

danielaskdd
Copy link
Contributor

@danielaskdd danielaskdd commented Feb 1, 2025

Problems Addressed

  1. When enable_llm_cache_for_entity_extract is True and enable_llm_cache is False, there is race condition when doing document indexing and user query at the same time. Func extract_entities temporarily modifies global config enable_llm_cache to True, causing user queries to also temporarily use the cache.
  2. User query is almost blocked by document indexing. The limit_async_func_call decorator implementation using sleep to wait for another thread to release resource is inefficient, that user queries will not have a chance to run until document indexing is complete.
  3. Thread safety issues in disk persistence operations of NanoVectorDB.

Solutions

  1. Entity Extraction Improvements:
  • A force_llm_cache param is introduced to cache handle func, forcing handle_cache to check the cache no matter what global config settings.
  • Modified extract_entities func to use this new param to control the behavior of handle_cache.
  1. Concurrent Operation Optimization:
  • Replaced max concurrent instances control method of limit_async_func_call with asyncio.Semaphore, ensuring FIFO order to prevent user query wait too long after document indexing start.
  • Removed redundant semaphore logic from EmbeddingFunc.
  1. Introduced asyncio.Lock to protect file save operations in NanoVectorDBStorage.

Bug fixes

  1. Fix quantize_embedding func for not support list type embedding param.
  2. Fix prompt response cache fail when is_embedding_cache_enabled is true, by adding cache_type to llm_response_cache, and introduced a new cache_type call extract for entities extraction phase .
  3. Fix LLM similarity verification fail by correcting the llm_model_func retrieval method.
  4. Update similarity_check prompt to avoid LLM generating two scores at times.

Change API Server Config on rag initialization

  1. Disable llm cache for entity extraction
  2. Enable embedding cache

- Abandon the approach of temporarily replacing the global llm_model_func configuration
- Introduce custom_llm function with new_config for handle_cache while extracting entities
- Update handle_cache to accept custom_llm
- Separate insert/query embedding funcs
- Add query-specific async limit
- Update storage classes to use new funcs
- Protect vector DB save with lock
- Improve config handling for thresholds
- Convert list to numpy array if needed
- Maintain existing functionality
- Introduce asyncio.Lock for save operations
- Ensure thread-safe file writes
…ormance.

- Replace custom counter with asyncio.Semaphore
- The existing implementation cannot follow the FIFO order
- Removed custom LLM function in entity extraction
- Simplified cache handling logic
- Added `force_llm_cache` parameter
- Updated cache handling conditions
@danielaskdd danielaskdd marked this pull request as ready for review February 1, 2025 22:49
@danielaskdd danielaskdd changed the title Fix concurrent problem for document indexing and user query Fixed concurrent problems for document indexing and user query Feb 2, 2025
@LarFii LarFii merged commit fade69a into HKUDS:main Feb 2, 2025
1 check passed
@danielaskdd danielaskdd deleted the fix-concurrent-problem branch February 5, 2025 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants