Skip to content

Commit

Permalink
documentation improvement
Browse files Browse the repository at this point in the history
  • Loading branch information
Guest400123064 committed Apr 27, 2024
1 parent 22804ce commit bca2e0f
Show file tree
Hide file tree
Showing 9 changed files with 4,959 additions and 2 deletions.
51 changes: 51 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: website

# build the documentation whenever there are new commits on main
on:
push:
branches:
- main
# Alternative: only build for tags.
# tags:
# - '*'

# security: restrict permissions for CI jobs.
permissions:
contents: read

jobs:
# Build the documentation and upload the static HTML files as an artifact.
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.9'

# ADJUST THIS: install all dependencies (including pdoc)
- run: pip install -e .
- run: pip install pdoc

# ADJUST THIS: build your documentation into docs/.
# We use a custom build script for pdoc itself, ideally you just run `pdoc -o docs/ ...` here.
- run: pdoc src/bbm25_haystack -o docs/

- uses: actions/upload-pages-artifact@v3
with:
path: docs/

# Deploy the artifact to GitHub pages.
# This is a separate job so that only actions/deploy-pages has the necessary permissions.
deploy:
needs: build
runs-on: ubuntu-latest
permissions:
pages: write
id-token: write
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- id: deployment
uses: actions/deploy-pages@v4
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@ $ pip install -e .

## Usage

The initializer takes [three BM25+ hyperparameters](https://en.wikipedia.org/wiki/Okapi_BM25), namely `k1`, `b`, and `delta`, one path to a trained SentencePiece tokenizer `.model` file, and a filtering logic flag ([see below](#filtering-logic)). All parameters are optional. The default tokenizer is directly copied from [LLaMA-2-7B-32K tokenizer](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K/blob/main/tokenizer.model) with a vocab size of 32,000.
### Quick Start

Below is an example of how you can build a minimal search engine with the `bbm25_haystack` components on their own. They are also compatible with [Haystack pipelines](https://docs.haystack.deepset.ai/docs/creating-pipelines).

```python
from haystack import Document
Expand All @@ -35,13 +37,17 @@ document_store = BetterBM25DocumentStore()
document_store.write_documents([
Document(content="There are over 7,000 languages spoken around the world today."),
Document(content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors."),
Document(content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.")
Document(content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bio-luminescent waves.")
])

retriever = BetterBM25Retriever(document_store)
retriever.run(query="How many languages are spoken around the world today?")
```

### API References

The initializer takes [three BM25+ hyperparameters](https://en.wikipedia.org/wiki/Okapi_BM25), namely `k1`, `b`, and `delta`, and path to a trained SentencePiece tokenizer `.model` file. All parameters are optional. The default tokenizer is directly copied from [LLaMA-2-7B-32K tokenizer](https://huggingface.co/togethercomputer/LLaMA-2-7B-32K/blob/main/tokenizer.model) with a vocab size of 32,000.

## Filtering Logic

The current document store uses `document_matches_filter` shipped with Haystack to perform filtering by default, which is the same as `InMemoryDocumentStore` except that it is DOES NOT support legacy operator names.
Expand Down
1,623 changes: 1,623 additions & 0 deletions docs/bbm25_haystack.html

Large diffs are not rendered by default.

246 changes: 246 additions & 0 deletions docs/bbm25_haystack/__about__.html

Large diffs are not rendered by default.

850 changes: 850 additions & 0 deletions docs/bbm25_haystack/bbm25_retriever.html

Large diffs are not rendered by default.

1,529 changes: 1,529 additions & 0 deletions docs/bbm25_haystack/bbm25_store.html

Large diffs are not rendered by default.

599 changes: 599 additions & 0 deletions docs/bbm25_haystack/filters.html

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions docs/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="refresh" content="0; url=./bbm25_haystack.html"/>
</head>
</html>
46 changes: 46 additions & 0 deletions docs/search.js

Large diffs are not rendered by default.

0 comments on commit bca2e0f

Please sign in to comment.