Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is AsyncWriter reliable to use? #578

Open
davidshen84 opened this issue Apr 1, 2023 · 0 comments
Open

Is AsyncWriter reliable to use? #578

davidshen84 opened this issue Apr 1, 2023 · 0 comments

Comments

@davidshen84
Copy link

Hi,

I indexed the same set of documents using both BufferedWriter and AsyncWriter, and I found the search results from AsyncWriter are very poor if not incorrect.

My code for using AsyncWriter indexer looks like this.

def add_document(data: Dict[str, str]) -> None:
    with AsyncWriter(shared_ix) as writer:
        writer.add_document(id=str(data['id']), path=data['path'], content=data[content])
        logger.info('added %s', data['path'])


def init_pool(ix: IndexWriter):
    global shared_ix
    shared_ix = ix

# ...define schema...
ix = create_in(index_dir, schema)
with Pool(initializer=init_pool, initargs=(ix,)) as pool:
    pool.map(add_document, doc_set_list)

There's no error/warning during indexing with the AsyncWriter, but the resulting index folder is about 8 MB smaller than the one indexed using the BufferredWriter.

I understand the document said it is a sample implementation. How is it good for local development and evaluation?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant