Skip to content

Commit

Permalink
Merge pull request #1345 from Jacksonxhx/JacksonX
Browse files Browse the repository at this point in the history
Update integration notebook format and Add Jina Semantic Search
  • Loading branch information
zc277584121 authored May 30, 2024
2 parents aedf550 + 3aeac82 commit 093c03f
Show file tree
Hide file tree
Showing 6 changed files with 185 additions and 169 deletions.
12 changes: 4 additions & 8 deletions bootcamp/tutorials/integration/milvus_and_DSPy.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,8 @@
{
"cell_type": "code",
"source": [
"!pip install -U pymilvus\n",
"!pip install \"dspy-ai[milvus]\""
"!pip install \"dspy-ai[milvus]\"\n",
"!pip install -U pymilvus"
],
"metadata": {
"collapsed": false
Expand Down Expand Up @@ -192,15 +192,11 @@
" )"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-05-29T09:25:37.564410Z",
"start_time": "2024-05-29T09:17:39.288188Z"
}
"collapsed": false
},
"id": "b5de2cdb94b92e2c",
"outputs": [],
"execution_count": 2
"execution_count": null
},
{
"cell_type": "markdown",
Expand Down
195 changes: 160 additions & 35 deletions bootcamp/tutorials/integration/milvus_with_Jina.ipynb
Original file line number Diff line number Diff line change
@@ -1,18 +1,23 @@
{
"cells": [
{
"metadata": {},
"metadata": {
"id": "e17219815f8987d9"
},
"cell_type": "markdown",
"source": "<a href=\"https://colab.research.google.com/github/milvus-io/bootcamp/blob/master/bootcamp/tutorials/integration/milvus_with_Jina.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>",
"source": "# Integrate Milvus with Jina AI",
"id": "e17219815f8987d9"
},
{
"cell_type": "markdown",
"source": [
"# Jina"
"<a href=\"https://colab.research.google.com/github/milvus-io/bootcamp/blob/master/bootcamp/tutorials/integration/milvus_with_Jina.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
"\n",
"This guide demonstrates how to use Jina AI embeddings and Milvus to conduct similarity search and retrieval tasks."
],
"metadata": {
"collapsed": false
"collapsed": false,
"id": "20180cc31eaf0495"
},
"id": "20180cc31eaf0495"
},
Expand All @@ -24,18 +29,20 @@
"Jina AI's cutting-edge embeddings boast top-tier performance, featuring an 8192 token-length model ideal for comprehensive data representation. Offering multilingual support and seamless integration with leading platforms like OpenAI, these embeddings facilitate cross-lingual applications."
],
"metadata": {
"collapsed": false
"collapsed": false,
"id": "990b149ce5c688b2"
},
"id": "990b149ce5c688b2"
},
{
"cell_type": "markdown",
"source": [
"## Milvus and Jina AI's Embedding\n",
"In order to store and search these embeddings efficiently for speed and scale, specific infrastructure designed for this purpose is required. Milvus is a widely known advanced open-source vector database capable of handling large-scale vector data. Milvus enables fast and accurate vector(embedding) search according plenty of metrics. Its scalability allows for seamless handling of massive volumes of image data, ensuring high-performance search operations even as datasets grow. "
"In order to store and search these embeddings efficiently for speed and scale, specific infrastructure designed for this purpose is required. Milvus is a widely known advanced open-source vector database capable of handling large-scale vector data. Milvus enables fast and accurate vector(embedding) search according plenty of metrics. Its scalability allows for seamless handling of massive volumes of image data, ensuring high-performance search operations even as datasets grow."
],
"metadata": {
"collapsed": false
"collapsed": false,
"id": "d27fd9a9cf451a02"
},
"id": "d27fd9a9cf451a02"
},
Expand All @@ -48,7 +55,8 @@
"Before we start, we need to install model library for PyMilvus."
],
"metadata": {
"collapsed": false
"collapsed": false,
"id": "4ff70dd614666672"
},
"id": "4ff70dd614666672"
},
Expand All @@ -59,11 +67,41 @@
"!pip install \"milvus[model]\""
],
"metadata": {
"collapsed": false
"id": "f748781570cc911f",
"ExecuteTime": {
"end_time": "2024-05-30T07:34:20.652135Z",
"start_time": "2024-05-30T07:34:12.092107Z"
}
},
"id": "f748781570cc911f",
"outputs": [],
"execution_count": null
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pymilvus in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (2.4.3)\r\n",
"Requirement already satisfied: setuptools>=67 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (70.0.0)\r\n",
"Requirement already satisfied: grpcio<=1.63.0,>=1.49.1 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (1.63.0)\r\n",
"Requirement already satisfied: protobuf>=3.20.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (3.20.2)\r\n",
"Requirement already satisfied: environs<=9.5.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (9.5.0)\r\n",
"Requirement already satisfied: ujson>=2.0.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (5.10.0)\r\n",
"Requirement already satisfied: pandas>=1.2.4 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (2.2.2)\r\n",
"Requirement already satisfied: milvus-lite<2.5.0,>=2.4.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (2.4.5)\r\n",
"Requirement already satisfied: marshmallow>=3.0.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from environs<=9.5.0->pymilvus) (3.21.2)\r\n",
"Requirement already satisfied: python-dotenv in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from environs<=9.5.0->pymilvus) (1.0.1)\r\n",
"Requirement already satisfied: numpy>=1.26.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pandas>=1.2.4->pymilvus) (1.26.4)\r\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pandas>=1.2.4->pymilvus) (2.9.0.post0)\r\n",
"Requirement already satisfied: pytz>=2020.1 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pandas>=1.2.4->pymilvus) (2024.1)\r\n",
"Requirement already satisfied: tzdata>=2022.7 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pandas>=1.2.4->pymilvus) (2024.1)\r\n",
"Requirement already satisfied: packaging>=17.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from marshmallow>=3.0.0->environs<=9.5.0->pymilvus) (24.0)\r\n",
"Requirement already satisfied: six>=1.5 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas>=1.2.4->pymilvus) (1.16.0)\r\n",
"Requirement already satisfied: milvus[model] in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (2.3.5)\r\n",
"\u001b[33mWARNING: milvus 2.3.5 does not provide the extra 'model'\u001b[0m\u001b[33m\r\n",
"\u001b[0m"
]
}
],
"execution_count": 1
},
{
"cell_type": "markdown",
Expand All @@ -72,7 +110,8 @@
"Jina AI's core embedding model, excels in understanding detailed text, making it ideal for semantic search, content classification thus supports advanced sentiment analysis, text summarization, and personalized recommendation systems."
],
"metadata": {
"collapsed": false
"collapsed": false,
"id": "c9251246e4ce2edb"
},
"id": "c9251246e4ce2edb"
},
Expand All @@ -91,7 +130,7 @@
"dvecs = ef.encode_documents([doc])"
],
"metadata": {
"collapsed": false
"id": "541e01d196bfb8fd"
},
"id": "541e01d196bfb8fd",
"outputs": [],
Expand All @@ -104,7 +143,8 @@
"Jina AI's bilingual models enhance multilingual platforms, global support, and cross-lingual content discovery. Designed for German-English and Chinese-English translations, they foster understanding among diverse linguistic groups, simplifying interactions across languages."
],
"metadata": {
"collapsed": false
"collapsed": false,
"id": "76bf2f06073c10c"
},
"id": "76bf2f06073c10c"
},
Expand All @@ -123,7 +163,7 @@
"dvecs = ef.encode_documents([doc])"
],
"metadata": {
"collapsed": false
"id": "7877da246e95292b"
},
"id": "7877da246e95292b",
"outputs": [],
Expand All @@ -136,7 +176,8 @@
"Jina AI's code embedding model provides searching ability through code and documentation. It supports English and 30 popular programming languages that can be used for enhancing code navigation, streamlined code review and automated documentation assistance."
],
"metadata": {
"collapsed": false
"collapsed": false,
"id": "5086f6f097d5de36"
},
"id": "5086f6f097d5de36"
},
Expand Down Expand Up @@ -181,51 +222,113 @@
"dvecs = ef.encode_documents([doc])"
],
"metadata": {
"collapsed": false
"id": "b54e81a1863eadbc"
},
"id": "b54e81a1863eadbc",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Jina Reranker\n",
"Jina Ai also provides rerankers to further enhance retrieval quality after searching using embeddings."
"## Semantic Search with Jina & Milvus\n",
"With the strong embedding function, we can combine the embeddings retrieved by utilizing Jina AI models with Milvus Lite vector database to perform semantic search."
],
"metadata": {
"collapsed": false
},
"id": "c067c5218388d11f"
"id": "3fb7ecc7c0bb19ef"
},
{
"metadata": {},
"cell_type": "code",
"source": [
"from pymilvus.model.reranker import JinaRerankFunction\n",
"from pymilvus.model.dense import JinaEmbeddingFunction\n",
"from pymilvus import MilvusClient\n",
"\n",
"jina_api_key = \"<YOUR_JINA_API_KEY>\"\n",
"ef = JinaEmbeddingFunction(\"jina-embeddings-v2-base-en\", jina_api_key)\n",
"DIMENSION = 768 # size of jina-embeddings-v2-base-en\n",
"\n",
"rf = JinaRerankFunction(\"jina-reranker-v1-base-en\", jina_api_key)\n",
"\n",
"query = \"What event in 1956 marked the official birth of artificial intelligence as a discipline?\"\n",
"\n",
"documents = [\n",
"doc = [\n",
" \"In 1950, Alan Turing published his seminal paper, 'Computing Machinery and Intelligence,' proposing the Turing Test as a criterion of intelligence, a foundational concept in the philosophy and development of artificial intelligence.\",\n",
" \"The Dartmouth Conference in 1956 is considered the birthplace of artificial intelligence as a field; here, John McCarthy and others coined the term 'artificial intelligence' and laid out its basic goals.\",\n",
" \"In 1951, British mathematician and computer scientist Alan Turing also developed the first program designed to play chess, demonstrating an early example of AI in game strategy.\",\n",
" \"The invention of the Logic Theorist by Allen Newell, Herbert A. Simon, and Cliff Shaw in 1955 marked the creation of the first true AI program, which was capable of solving logic problems, akin to proving mathematical theorems.\",\n",
"]\n",
"\n",
"rf(query, documents)"
"dvecs = ef.encode_documents(doc)\n",
"\n",
"data = [\n",
" {\"id\": i, \"vector\": dvecs[i], \"text\": doc[i], \"subject\": \"history\"}\n",
" for i in range(len(dvecs))\n",
"]\n",
"\n",
"milvus_client = MilvusClient(\"./milvus_jina_demo.db\")\n",
"COLLECTION_NAME = \"demo_collection\" # Milvus collection name\n",
"milvus_client.create_collection(collection_name=COLLECTION_NAME, dimension=DIMENSION)\n",
"\n",
"res = milvus_client.insert(collection_name=COLLECTION_NAME, data=data)\n",
"\n",
"print(res[\"insert_count\"])"
],
"id": "83dc520d0684b82e",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"cell_type": "markdown",
"source": "With all data in Milvus vector database, we can now perform semantic search by generating vector embedding for the query and conduct vector search.",
"id": "774929336febc81d"
},
{
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-05-29T08:01:37.037015Z",
"start_time": "2024-05-29T08:01:36.434321Z"
"end_time": "2024-05-30T07:44:54.881785Z",
"start_time": "2024-05-30T07:44:54.396756Z"
}
},
"id": "1953c1d0c63f53b",
"cell_type": "code",
"source": [
"queries = \"What event in 1956 marked the official birth of artificial intelligence as a discipline?\"\n",
"qvecs = ef.encode_queries([queries])\n",
"\n",
"res = milvus_client.search(\n",
" collection_name=COLLECTION_NAME, # target collection\n",
" data=[qvecs[0]], # query vectors\n",
" limit=3, # number of returned entities\n",
" output_fields=[\"text\", \"subject\"], # specifies fields to be returned\n",
")[0]\n",
"\n",
"for result in res:\n",
" print(result)"
],
"id": "19e8c0e5e7b49c5b",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'id': 1, 'distance': 0.8802614808082581, 'entity': {'text': \"The Dartmouth Conference in 1956 is considered the birthplace of artificial intelligence as a field; here, John McCarthy and others coined the term 'artificial intelligence' and laid out its basic goals.\", 'subject': 'history'}}\n"
]
}
],
"execution_count": 10
},
{
"cell_type": "markdown",
"source": [
"## Jina Reranker\n",
"Jina Ai also provides rerankers to further enhance retrieval quality after searching using embeddings."
],
"metadata": {
"collapsed": false,
"id": "c067c5218388d11f"
},
"id": "c067c5218388d11f"
},
{
"metadata": {},
"cell_type": "code",
"outputs": [
{
"data": {
Expand All @@ -241,7 +344,26 @@
"output_type": "execute_result"
}
],
"execution_count": 6
"execution_count": null,
"source": [
"from pymilvus.model.reranker import JinaRerankFunction\n",
"\n",
"jina_api_key = \"<YOUR_JINA_API_KEY>\"\n",
"\n",
"rf = JinaRerankFunction(\"jina-reranker-v1-base-en\", jina_api_key)\n",
"\n",
"query = \"What event in 1956 marked the official birth of artificial intelligence as a discipline?\"\n",
"\n",
"documents = [\n",
" \"In 1950, Alan Turing published his seminal paper, 'Computing Machinery and Intelligence,' proposing the Turing Test as a criterion of intelligence, a foundational concept in the philosophy and development of artificial intelligence.\",\n",
" \"The Dartmouth Conference in 1956 is considered the birthplace of artificial intelligence as a field; here, John McCarthy and others coined the term 'artificial intelligence' and laid out its basic goals.\",\n",
" \"In 1951, British mathematician and computer scientist Alan Turing also developed the first program designed to play chess, demonstrating an early example of AI in game strategy.\",\n",
" \"The invention of the Logic Theorist by Allen Newell, Herbert A. Simon, and Cliff Shaw in 1955 marked the creation of the first true AI program, which was capable of solving logic problems, akin to proving mathematical theorems.\",\n",
"]\n",
"\n",
"rf(query, documents)"
],
"id": "1953c1d0c63f53b"
}
],
"metadata": {
Expand All @@ -261,6 +383,9 @@
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
},
"colab": {
"provenance": []
}
},
"nbformat": 4,
Expand Down
12 changes: 2 additions & 10 deletions bootcamp/tutorials/integration/qa_with_milvus_and_hf.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@
"metadata": {},
"source": [
"<a href=\"https://colab.research.google.com/github/milvus-io/bootcamp/blob/master/bootcamp/tutorials/integration/qa_with_milvus_and_hf.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
],
"outputs": []
]
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -283,13 +282,6 @@
" )\n",
" print(\"\\n\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -313,4 +305,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Loading

0 comments on commit 093c03f

Please sign in to comment.