Fixed concurrent problems for document indexing and user query #693

danielaskdd · 2025-02-01T21:41:47Z

Problems Addressed

When enable_llm_cache_for_entity_extract is True and enable_llm_cache is False, there is race condition when doing document indexing and user query at the same time. Func extract_entities temporarily modifies global config enable_llm_cache to True, causing user queries to also temporarily use the cache.
User query is almost blocked by document indexing. The limit_async_func_call decorator implementation using sleep to wait for another thread to release resource is inefficient, that user queries will not have a chance to run until document indexing is complete.
Thread safety issues in disk persistence operations of NanoVectorDB.

Solutions

Entity Extraction Improvements:

A force_llm_cache param is introduced to cache handle func, forcing handle_cache to check the cache no matter what global config settings.
Modified extract_entities func to use this new param to control the behavior of handle_cache.

Concurrent Operation Optimization:

Replaced max concurrent instances control method of limit_async_func_call with asyncio.Semaphore, ensuring FIFO order to prevent user query wait too long after document indexing start.
Removed redundant semaphore logic from EmbeddingFunc.

Introduced asyncio.Lock to protect file save operations in NanoVectorDBStorage.

Bug fixes

Fix quantize_embedding func for not support list type embedding param.
Fix prompt response cache fail when is_embedding_cache_enabled is true, by adding cache_type to llm_response_cache, and introduced a new cache_type call extract for entities extraction phase .
Fix LLM similarity verification fail by correcting the llm_model_func retrieval method.
Update similarity_check prompt to avoid LLM generating two scores at times.

Change API Server Config on rag initialization

Disable llm cache for entity extraction
Enable embedding cache

- Abandon the approach of temporarily replacing the global llm_model_func configuration - Introduce custom_llm function with new_config for handle_cache while extracting entities - Update handle_cache to accept custom_llm

- Separate insert/query embedding funcs - Add query-specific async limit - Update storage classes to use new funcs - Protect vector DB save with lock - Improve config handling for thresholds

… very long context is provided.

…lready control by limit_async_func_call)

- Convert list to numpy array if needed - Maintain existing functionality

This reverts commit 21481db.

- Introduce asyncio.Lock for save operations - Ensure thread-safe file writes

…ormance. - Replace custom counter with asyncio.Semaphore - The existing implementation cannot follow the FIFO order

…nc control)

…nchanged.

- Removed custom LLM function in entity extraction - Simplified cache handling logic - Added `force_llm_cache` parameter - Updated cache handling conditions

…n for API Server

danielaskdd added 30 commits January 30, 2025 02:45

Fix concurrent problem on extract_entities function.

cc50ade

- Abandon the approach of temporarily replacing the global llm_model_func configuration - Introduce custom_llm function with new_config for handle_cache while extracting entities - Update handle_cache to accept custom_llm

Merge branch 'main' into fix-extract-entity-concurrent-problem

54b6807

Refactor embedding functions and add async query limit

21481db

- Separate insert/query embedding funcs - Add query-specific async limit - Update storage classes to use new funcs - Protect vector DB save with lock - Improve config handling for thresholds

Shorten log message for cosine similarity threshold.

389f4ee

Fix linting

b0d87b2

Add logging for chunk truncation in mix_kg_vector_query

2a010c9

Fix linting

b22a8b2

Improve prompts to avoid make-up respond from LLM like qwen-plus when…

0692635

… very long context is provided.

Fix typo in prompt

60b66b9

remove semaphore logic from EmbeddingFunc(cause num of instances is a…

1192727

…lready control by limit_async_func_call)

Add support for list input in quantize_embedding function

2ba36f8

- Convert list to numpy array if needed - Maintain existing functionality

Revert "Refactor embedding functions and add async query limit"

6a326e2

This reverts commit 21481db.

Add lock to protect file write operations in NanoVectorDBStorage

635d4fd

- Introduce asyncio.Lock for save operations - Ensure thread-safe file writes

Refactor async call limiting to use asyncio.Semaphore for better perf…

b109f57

…ormance. - Replace custom counter with asyncio.Semaphore - The existing implementation cannot follow the FIFO order

Merge branch 'main' into fix-extract-entity-concurrent-problem

a0710e9

Fix linting

95edf8a

remove unused parm

c98a675

Use direct embedding_func from hashing_kv (do not by pass maxiumu asy…

c394207

…nc control)

Save cache_type to llm_response_cache

3bc7c4d

Add embedding_func to llm_response_cache

b87703a

Fix typo error

3c3cdba

Save keywords to cache only when it's no empty

2d387fa

Add cache type "extract" for entity extraction

c9481c8

Refactor cache handling logic for better readability, keep function u…

6c7d7c2

…nchanged.

Refactor LLM cache handling and entity extraction

b45ae15

- Removed custom LLM function in entity extraction - Simplified cache handling logic - Added `force_llm_cache` parameter - Updated cache handling conditions

Fix linting

5d14ab0

Fix prompt respond cache fail when is_embedding_cache_enabled is true

bed5a97

Set embedding_func in all llm_response_cache

fdc9017

Add debug logging for cache response retrieval

873b52d

Fix llm_model_func retrieval error.

8484564

danielaskdd added 4 commits February 2, 2025 04:22

Update similarity_check prompt to avoid generating two scores sometiimes

6f5503e

Add embedding cache config and disable LLM cache for entity extractio…

ecf48a5

…n for API Server

Fix linting

0a693db

Merge branch 'main' into fix-concurrent-problem

6e1b5d6

danielaskdd marked this pull request as ready for review February 1, 2025 22:49

Add comment to clarify LLM cache setting for entity extraction

7ea1856

danielaskdd changed the title ~~Fix concurrent problem for document indexing and user query~~ Fixed concurrent problems for document indexing and user query Feb 2, 2025

LarFii merged commit fade69a into HKUDS:main Feb 2, 2025
1 check passed

danielaskdd deleted the fix-concurrent-problem branch February 5, 2025 18:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed concurrent problems for document indexing and user query #693

Fixed concurrent problems for document indexing and user query #693

danielaskdd commented Feb 1, 2025 •

edited

Loading

Fixed concurrent problems for document indexing and user query #693

Fixed concurrent problems for document indexing and user query #693

Conversation

danielaskdd commented Feb 1, 2025 • edited Loading

Problems Addressed

Solutions

Bug fixes

Change API Server Config on rag initialization

danielaskdd commented Feb 1, 2025 •

edited

Loading