Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for concurrent document indexing and querying #684

Closed

Conversation

danielaskdd
Copy link
Contributor

@danielaskdd danielaskdd commented Jan 31, 2025

Problems Addressed

  1. When enable_llm_cache_for_entity_extract is True and enable_llm_cache is False, there is race condition when doing document indexing and user query at the same time. Func extract_entities temporary modify global config enable_llm_cache to True, causing user query also using cache temporary.
  2. Performance bottleneck during document indexing due to LLM function instance exhaustion, causing user query latency. The limit_async_func_call decorator implementation using sleep to wait for another thread to release resource is so bad, that user query will not have a chance to run until document index is over.
  3. Thread safety issues in NanoVectorDB's disk persistence operations.

Solutions

  1. Entity Extraction Improvements:
  • Intruduce force_llm_cache parm to handle_cache, forcing the function handle cache no mater what global config settings are.
  • Modify extract_entities to use this new parm to control the behavior of handle_cache.
  1. Concurrent Operation Optimization:
  • Replaced function call limiting mechanism of limit_async_func_call with asyncio.Semaphore, ensuring FIFO order to prevent query starvation during indexing.
  • Removed redundant semaphore logic from EmbeddingFunc
  1. Introduce asyncio.Lock to protect file save operations in NanoVectorDBStorage

Above changes improve concurrent document indexing and querying reliability while maintaining system performance and improve thread safety.

Bug fixes

  1. Fix quantize_embedding for not support embedding vector of list type
  2. Fix cache_type missing in llm_response_cache, and add new cache_type extract for entities extraction llm respond cache.
  3. Fix prompt respond cache fail when is_embedding_cache_enabled is true.
  4. Fix llm_model_func retrieval error in handle_cache, this function is for LLM similarity verification.
  5. Update similarity_check prompt to avoid generating two scores sometimes.

- Abandon the approach of temporarily replacing the global llm_model_func configuration
- Introduce custom_llm function with new_config for handle_cache while extracting entities
- Update handle_cache to accept custom_llm
- Separate insert/query embedding funcs
- Add query-specific async limit
- Update storage classes to use new funcs
- Protect vector DB save with lock
- Improve config handling for thresholds
@danielaskdd
Copy link
Contributor Author

@ParisNeo This update involves some core functionalities in LightRAG, and I hope you could help with a preliminary review of the PR.

@danielaskdd danielaskdd changed the title Fix concurrent problem extract entity and query at the same time Fixes for concurrent document indexing and querying Jan 31, 2025
- Convert list to numpy array if needed
- Maintain existing functionality
- Introduce asyncio.Lock for save operations
- Ensure thread-safe file writes
…ormance.

- Replace custom counter with asyncio.Semaphore
- The existing implementation cannot follow the FIFO order
- Removed custom LLM function in entity extraction
- Simplified cache handling logic
- Added `force_llm_cache` parameter
- Updated cache handling conditions
@danielaskdd danielaskdd closed this Feb 1, 2025
@danielaskdd danielaskdd deleted the fix-extract-entity-concurrent-problem branch February 1, 2025 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants