Skip to content

Commit

Permalink
Add new Model Hub notebook tutorial demonstrating interoperability wi…
Browse files Browse the repository at this point in the history
…th certain HuggingFace libraries.
  • Loading branch information
orendain committed Dec 18, 2024
1 parent 1308edb commit 89b455b
Showing 1 changed file with 355 additions and 0 deletions.
355 changes: 355 additions & 0 deletions 9 Model Hub: Using Hugging Face libraries.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,355 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "7f035c46-7f0f-4e61-b34b-97583b278829",
"metadata": {},
"source": [
"# Model Hub - Using Hugging Face libraries\n",
"\n",
"This notebook will go through using Hugging Face libraries with models stored in H2O Model Hub.\n",
"\n",
"We will cover:\n",
"- Determining H2O AI Cloud values to use with Hugging Face libraries\n",
" - Endpoint URL\n",
" - Access token\n",
"- Using Hugging Face libraries with models in H2O AI Cloud\n",
" - Configuring the Hugging Face library\n",
" - Example: Using the `pipeline` API from the Hugging Face `transformers` library\n",
" - Example: Downloading single model files with the Hugging Face `huggingface_hub` library"
]
},
{
"cell_type": "markdown",
"id": "63b0398b-9131-4bb7-aab6-a4e4b607fd58",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"Let's install packages we'll use, and decide on the model we'll be playing with."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b8532bf1-82fc-4867-901c-7f56c5167f8c",
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"!{sys.executable} -m pip install -q \"h2o-authn[discovery]\"\n",
"!{sys.executable} -m pip install -q huggingface_hub\n",
"!{sys.executable} -m pip install -q transformers\n",
"!{sys.executable} -m pip install -q torch --index-url https://download.pytorch.org/whl/cpu"
]
},
{
"cell_type": "markdown",
"id": "eb5f3d4e-52be-4b79-b685-3e97e2e24e5f",
"metadata": {},
"source": [
"> 📢 Important\n",
">\n",
"> This notebook makes the assumption that a model has already been uploaded into H2O Model Hub. This is not the case for new environments. Replace the value below with a model which you know exists in your environment.\n",
"\n",
"Let's decide on the model that we'll be reading from Model Hub. The model is in standard Hugging Face \"repo ID\" format."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5771f7e9-2907-4c9c-8d04-eb153104096f",
"metadata": {},
"outputs": [],
"source": [
"available_modelhub_model = \"albert/albert-base-v2\""
]
},
{
"cell_type": "markdown",
"id": "c31aba3c-4cbe-4c75-a9c1-5959f7dd871b",
"metadata": {},
"source": [
"## Determining H2O AI Cloud values to use with Hugging Face libraries"
]
},
{
"cell_type": "markdown",
"id": "ab72efd9-5d0c-4733-b000-ff88ade56359",
"metadata": {},
"source": [
"> 📢 Important\n",
"> \n",
"> This section assumes that an H2O AI Cloud environment can be discovered from your environment.\n",
"> On local environments, this means having the H2O CLI installed and configured.\n",
">\n",
"> For information on other ways to discover required H2O AI Cloud, please see the notebook titled _\"Drive - Connecting from different environments\"_.\n",
"\n",
"We'll start by discoverying the H2O AI Cloud the environment has been configured for."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "36fc1fdb-2c9b-4187-9219-69bfe0739b36",
"metadata": {},
"outputs": [],
"source": [
"import h2o_discovery\n",
"\n",
"discovery = h2o_discovery.discover()"
]
},
{
"cell_type": "markdown",
"id": "8d8b2790-e4d4-4026-94b2-e03ccbadd7cc",
"metadata": {},
"source": [
"### Endpoint URL"
]
},
{
"cell_type": "markdown",
"id": "0aa42106-ec44-4dca-8ff1-08611b80998f",
"metadata": {},
"source": [
"We can use the `discovery` object to find the URL for H2O Model Hub, the H2O service serving a set of Hugging Face compatible APIs."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d6a55451-a840-4006-b3c1-00c3bdbb9739",
"metadata": {},
"outputs": [],
"source": [
"base_modelhub_url = discovery.services[\"modelhub\"].uri"
]
},
{
"cell_type": "markdown",
"id": "8b9fe869-ce6a-4f04-92d1-e737dc221583",
"metadata": {},
"source": [
"A neat feature of Model Hub is it's support for multiple, isolated, virtual registries.\n",
"\n",
"For this notebook tutorial, we're only concerned with the \"global\" registry. By default,the \"global\" registry grants all authenticated users read access by default. This is analogous to Hugging Face allowing users to download any public model.\n",
"\n",
"The final endpoint is a combination of Model Hub's base URL and the virtual registry (\"global\") we want to work with."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bd26c686-90d5-4f8b-bff3-8102ffdccf25",
"metadata": {},
"outputs": [],
"source": [
"modelhub_endpoint = f\"{base_modelhub_url}/global\""
]
},
{
"cell_type": "markdown",
"id": "af60a551-65d2-4579-b8aa-a010e83fdd35",
"metadata": {},
"source": [
"### Access token"
]
},
{
"cell_type": "markdown",
"id": "f1b2fdc2-c6d6-468d-b7f6-008dc3c11a72",
"metadata": {},
"source": [
"All requests to Model Hub must be authenticated. The `h2o_authn` package supplies a helper for creating a token provider for H2O AI Cloud.\n",
"\n",
"We create that token provider, and use it generate an access token allowing us access to Model Hub."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b8661323-6de8-423a-904b-a14b7df807dc",
"metadata": {},
"outputs": [],
"source": [
"import h2o_authn.discovery\n",
"\n",
"token_provider = h2o_authn.discovery.create(discovery)\n",
"access_token = str(token_provider.token())"
]
},
{
"cell_type": "markdown",
"id": "886c37fc-2f3d-44d7-ba3f-f10d16c5dc0c",
"metadata": {},
"source": [
"## Using Hugging Face libraries with models in H2O AI Cloud"
]
},
{
"cell_type": "markdown",
"id": "b9ae47d8-2fba-45ea-a6b5-f3d47cf3362d",
"metadata": {},
"source": [
"### Configuring the Hugging Face library"
]
},
{
"cell_type": "markdown",
"id": "b39f7868-3cfc-4967-9957-66c7c34c4bce",
"metadata": {},
"source": [
"For Hugging Face Python libraries to route requests to H2O's Model Hub, we set the `HF_ENDPOINT` environment variable. **This must be set before any Hugging Face library is loaded for the first time.**\n",
"\n",
"Some Hugging Face library functions take an `endpoint` argument, which could be set in place of setting an environment variable."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7360e26a-ea32-4f78-a548-2d54aa8ccf03",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"HF_ENDPOINT\"] = modelhub_endpoint"
]
},
{
"cell_type": "markdown",
"id": "576fb42b-35c5-4576-8e23-15d59f95f22f",
"metadata": {},
"source": [
"Setting the `HF_TOKEN` environment variable with our H2O access token will allow Hugging Face library operations to pass H2O AI Cloud authentication.\n",
"\n",
"Some Hugging Face library functions take a `token` argument, which could be set in place of setting an environment variable."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3ff77363-ced9-4221-ab30-8985cc2f0087",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"HF_TOKEN\"] = access_token"
]
},
{
"cell_type": "markdown",
"id": "26f46f9b-266f-469b-9837-62c1e0845835",
"metadata": {},
"source": [
"### Example: Using the `pipeline` API from the Hugging Face `transformers` library"
]
},
{
"cell_type": "markdown",
"id": "69835cf7-2c40-461f-9aa8-39f50da24d07",
"metadata": {},
"source": [
"Let's go through a demonstration of serving models in H2O AI Cloud via Hugging Face libraries.\n",
"\n",
"The `pipeline` API is a popular API in Hugging Face's `transformers` library. Pipelines group together a pretrained model with the preprocessing that was used during that model's training.\n",
"\n",
"Having already set `HF_ENDPOINT` and `HF_TOKEN` to point to, and authenticate with, H2O AI Cloud, we use the Hugging Face `pipelines` API as per usual."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4224f42a-2286-4911-9879-471eadcd1e55",
"metadata": {},
"outputs": [],
"source": [
"from transformers import pipeline\n",
"\n",
"sentiment_analysis = pipeline(\n",
" \"sentiment-analysis\",\n",
" model=available_modelhub_model,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f8a684b7-6b19-4a5b-81a5-57b303403e56",
"metadata": {},
"outputs": [],
"source": [
"sentiment_analysis(\"That's a pretty cool shirt.\")"
]
},
{
"cell_type": "markdown",
"id": "c0c57f7f-cdc1-4acd-9ecf-e4ab061494d8",
"metadata": {},
"source": [
"### Example: Downloading single model files with the Hugging Face `huggingface_hub` library"
]
},
{
"cell_type": "markdown",
"id": "c2ed87c7-5d75-477c-acaa-33f47c469134",
"metadata": {},
"source": [
"A common way to download individual model files is via Hugging Face's `hf_hub_download` function. It downloads the remote file, caches it on disk (in a version-aware way), and returns its local file path.\n",
"\n",
"Here, we use it to download and cache the `config.json` file from our model in H2O AI Cloud."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e36257c6-8f1c-45f3-a5f8-28f9023c8151",
"metadata": {},
"outputs": [],
"source": [
"import huggingface_hub\n",
"\n",
"huggingface_hub.hf_hub_download(repo_id=available_modelhub_model, filename=\"config.json\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": true
},
"toc-autonumbering": false,
"toc-showcode": false,
"toc-showmarkdowntxt": false,
"toc-showtags": false
},
"nbformat": 4,
"nbformat_minor": 5
}

0 comments on commit 89b455b

Please sign in to comment.