Merge pull request #72 from supabase/or/bedrock-guide

Add AWS Bedrock guide
supabase · Feb 1, 2024 · 0ee17d1 · 0ee17d1
2 parents fb36966 + 4d96655
commit 0ee17d1
Show file tree

Hide file tree

Showing 3 changed files with 126 additions and 1 deletion.
diff --git a/docs/integrations_bedrock.md b/docs/integrations_bedrock.md
@@ -0,0 +1,124 @@
+# Integration: AWS Bedrock
+
+This guide will walk you through an example using Amazon Bedrock SDK with `vecs`. We will create embeddings using the Amazon Titan Embeddings G1 – Text v1.2 (amazon.titan-embed-text-v1) model, insert these embeddings into a PostgreSQL database using vecs, and then query the collection to find the most similar sentences to a given query sentence.
+
+## Create an Environment
+
+First, you need to set up your environment. You will need Python 3.7+ with the `vecs` and `boto3` libraries installed.
+
+You can install the necessary Python libraries using pip:
+
+```sh
+pip install vecs boto3
+```
+
+You'll also need:
+
+- [Credentials to your AWS account](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html)
+- [A Postgres Database with the pgvector extension](hosting.md)
+
+## Create Embeddings
+
+Next, we will use Amazon’s Titan Embedding G1 - Text v1.2 model to create embeddings for a set of sentences.
+
+```python
+import boto3
+import vecs
+import json
+
+client = boto3.client(
+    'bedrock-runtime',
+    region_name='us-east-1',
+	# Credentials from your AWS account
+    aws_access_key_id='<replace_your_own_credentials>',
+    aws_secret_access_key='<replace_your_own_credentials>',
+    aws_session_token='<replace_your_own_credentials>',
+)
+
+dataset = [
+    "The cat sat on the mat.",
+    "The quick brown fox jumps over the lazy dog.",
+    "Friends, Romans, countrymen, lend me your ears",
+    "To be or not to be, that is the question.",
+]
+
+embeddings = []
+
+for sentence in dataset:
+    # invoke the embeddings model for each sentence
+    response = client.invoke_model(
+        body= json.dumps({"inputText": sentence}),
+        modelId= "amazon.titan-embed-text-v1",
+        accept = "application/json",
+        contentType = "application/json"
+    )
+    # collect the embedding from the response
+    response_body = json.loads(response["body"].read())
+    # add the embedding to the embedding list
+    embeddings.append((sentence, response_body.get("embedding"), {}))
+
+```
+
+### Store the Embeddings with vecs
+
+Now that we have our embeddings, we can insert them into a PostgreSQL database using vecs.
+
+```python
+import vecs
+
+DB_CONNECTION = "postgresql://<user>:<password>@<host>:<port>/<db_name>"
+
+# create vector store client
+vx = vecs.Client(DB_CONNECTION)
+
+# create a collection named 'sentences' with 1536 dimensional vectors
+# to match the default dimension of the Titan Embeddings G1 - Text model
+sentences = vx.get_or_create_collection(name="sentences", dimension=1536)
+
+# upsert the embeddings into the 'sentences' collection
+sentences.upsert(records=embeddings)
+
+# create an index for the 'sentences' collection
+sentences.create_index()
+```
+
+### Querying for Most Similar Sentences
+
+Now, we query the `sentences` collection to find the most similar sentences to a sample query sentence. First need to create an embedding for the query sentence. Next, we query the collection we created earlier to find the most similar sentences.
+
+```python
+query_sentence = "A quick animal jumps over a lazy one."
+
+# create vector store client
+vx = vecs.Client(DB_CONNECTION)
+
+# create an embedding for the query sentence
+response = client.invoke_model(
+        body= json.dumps({"inputText": query_sentence}),
+        modelId= "amazon.titan-embed-text-v1",
+        accept = "application/json",
+        contentType = "application/json"
+    )
+
+response_body = json.loads(response["body"].read())
+
+query_embedding = response_body.get("embedding")
+
+# query the 'sentences' collection for the most similar sentences
+results = sentences.query(
+    data=query_embedding,
+    limit=3,
+    include_value = True
+)
+
+# print the results
+for result in results:
+    print(result)
+```
+
+This returns the most similar 3 records and their distance to the query vector.
+```
+('The quick brown fox jumps over the lazy dog.', 0.27600620558852)
+('The cat sat on the mat.', 0.609986272479202)
+('To be or not to be, that is the question.', 0.744849503688346)
+```
diff --git a/docs/integrations_openai.md b/docs/integrations_openai.md
@@ -4,7 +4,7 @@ This guide will walk you through an example integration of the OpenAI API with t
 
 ## Create an Environment
 
-First, you need to set up your environment. You will need Python 3.7 with the `vecs` and `openai` libraries installed.
+First, you need to set up your environment. You will need Python 3.7+ with the `vecs` and `openai` libraries installed.
 
 You can install the necessary Python libraries using pip:
 

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -24,6 +24,7 @@ nav:
     - Metadata: concepts_metadata.md
   - Integrations:
     - OpenAI: integrations_openai.md
+    - Bedrock: integrations_bedrock.md
     - HuggingFace Inference Endpoints: integrations_huggingface_inference_endpoints.md
   - Support:
     - Changelog: support_changelog.md