Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend inference SDK with client for (almost all) core models #212

Merged
merged 6 commits into from
Dec 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 52 additions & 35 deletions docs/foundation/clip.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@ In this guide, we will show:
1. How to classify video frames with CLIP in real time, and;
2. How to calculate CLIP image and text embeddings for use in clustering and comparison.

## How can I use CLIP model in `inference`?
* directly from `inference[clip]` package, integrating the model directly into your code
* using `inference` HTTP API (hosted locally, or at Roboflow platform), integrating via HTTP protocol
* using `inference-sdk` package (`pip install inference-sdk`) and [`InferenceHTTPClient`](/docs/inference_sdk/http_client.md)
* creating custom code to make HTTP requests (see [API Reference](https://inference.roboflow.com/api/))

## Classify Video Frames

With CLIP, you can classify images and video frames without training a model. This is because CLIP has been pre-trained to recognize many different objects.
Expand Down Expand Up @@ -98,56 +104,50 @@ Below we show how to calculate, then compare, both types of embeddings.

### Image Embedding

!!! tip

In this example, we assume `inference-sdk` package installed
```
pip install inference-sdk
```

In the code below, we calculate an image embedding.

Create a new Python file and add this code:

```python
import os
import requests
#Define Request Payload
infer_clip_payload = {
#Images can be provided as urls or as bas64 encoded strings
"image": {
"type": "url",
"value": "https://i.imgur.com/Q6lDy8B.jpg",
},
}

# Define inference server url (localhost:9001, infer.roboflow.com, etc.)
base_url = "https://infer.roboflow.com"

# Define your Roboflow API Key
api_key = os.environ["API_KEY"]

res = requests.post(
f"{base_url}/clip/embed_image?api_key={api_key}",
json=infer_clip_payload,
)

embeddings = res.json()['embeddings']
from inference_sdk import InferenceHTTPClient

CLIENT = InferenceHTTPClient(
api_url="https://infer.roboflow.com",
api_key=os.environ["ROBOFLOW_API_KEY"],
)
embeddings = CLIENT.get_clip_image_embeddings(inference_input="https://i.imgur.com/Q6lDy8B.jpg")
print(embeddings)
```

### Text Embedding

In the code below, we calculate a text embedding.

!!! tip

In this example, we assume `inference-sdk` package installed
```
pip install inference-sdk
```

```python
import requests
#Define Request Payload
infer_clip_payload = {
"text": "the quick brown fox jumped over the lazy dog",
}

res = requests.post(
f"{base_url}/clip/embed_text?api_key={api_key}",
json=infer_clip_payload,
)
import os
from inference_sdk import InferenceHTTPClient

embeddings = res.json()['embeddings']
CLIENT = InferenceHTTPClient(
api_url="https://infer.roboflow.com",
api_key=os.environ["ROBOFLOW_API_KEY"],
)

embeddings = CLIENT.get_clip_text_embeddings(text="the quick brown fox jumped over the lazy dog")
print(embeddings)
```

Expand All @@ -157,10 +157,27 @@ To compare embeddings for similarity, you can use cosine similarity.

The code you need to compare image and text embeddings is the same.

!!! tip

In this example, we assume `inference-sdk` package installed
```
pip install inference-sdk
```

```python
from inference.core.utils.postprocess import cosine_similarity
import os
from inference_sdk import InferenceHTTPClient

similarity = cosine_similarity(image_embedding, text_embedding)
CLIENT = InferenceHTTPClient(
api_url="https://infer.roboflow.com",
api_key=os.environ["ROBOFLOW_API_KEY"],
)

result = CLIENT.clip_compare(
subject="./image.jpg",
prompt=["dog", "cat"]
)
print(result)
```

The resulting number will be between 0 and 1. The higher the number, the more similar the image and text are.
Expand Down
58 changes: 20 additions & 38 deletions docs/foundation/cogvlm.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,65 +16,47 @@ To use CogVLM with Inference, you will need a Roboflow API key. If you don't alr

Then, retrieve your API key from the Roboflow dashboard. [Learn how to retrieve your API key](https://docs.roboflow.com/api-reference/authentication#retrieve-an-api-key).

Run the following command to set your API key in your development environment:
Run the following command to set your API key in your coding environment:

```
export ROBOFLOW_API_KEY=<your api key>
```

We recommend using CogVLM paired with inference HTTP API adjusted to run in GPU environment. It's easy to set up
with our `inference-cli` tool. Run the following command to set up environment and run the API under
`http://localhost:9001`

```bash
pip install inference inference-cli inference-sdk
inference server start # make sure that you are running this at machine with GPU! Otherwise CogLVM will not be available
```

Let's ask a question about the following image:

![A forklift in a warehouse](https://lh7-us.googleusercontent.com/4rgEU3nMJQzr54mYpGifEQp0hn3wu4oG8Sa21373M43eQ5TML-lBJyzYz3ZmPEETFwKnUGMmncsWA68wHo-4yzEGTV--TNCY7MJTxpJ-cS2w9JdUuIGVnwfAQN_72wK7TgGv-gtuLusJtAjAZxJVBFA)

Create a new Python file and add the following code:
Use `inference-sdk` to prompt the model:

```python
import base64
import os
from PIL import Image
import requests

PORT = 9001
API_KEY = os.environ["API_KEY"]
IMAGE_PATH = "forklift.png"


def encode_base64(image_path):
with open(image_path, "rb") as image:
x = image.read()
image_string = base64.b64encode(x)

return image_string.decode("ascii")
from inference_sdk import InferenceHTTPClient

prompt = "Is there a forklift close to a conveyor belt?"

infer_payload = {
"image": {
"type": "base64",
"value": encode_base64(IMAGE_PATH),
},
"api_key": API_KEY,
"prompt": prompt,
}

results = requests.post(
f"http://localhost:{PORT}/llm/cogvlm",
json=infer_payload,
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001", # only local hosting supported
api_key=os.environ["ROBOFLOW_API_KEY"]
)

print(results.json())
result = CLIENT.prompt_cogvlm(
visual_prompt="./forklift.jpg",
text_prompt="Is there a forklift close to a conveyor belt?",
)
print(result)
```

Above, replace `forklift.jpeg` with the path to the image in which you want to detect objects.

Let's use the prompt "Is there a forklift close to a conveyor belt?”"

Run the Python script you have created:

```bash
python app.py
```

The results of CogVLM will appear in your terminal:

```python
Expand Down
37 changes: 7 additions & 30 deletions docs/foundation/doctr.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,43 +19,20 @@ Let's retrieve the text in the following image:
Create a new Python file and add the following code:

```python
import requests
import base64
from PIL import Image
import os
from io import BytesIO
from inference_sdk import InferenceHTTPClient

API_KEY = os.environ["API_KEY"]
IMAGE = "container.jpeg"
CLIENT = InferenceHTTPClient(
api_url="https://infer.roboflow.com",
api_key=os.environ["ROBOFLOW_API_KEY"]
)

image = Image.open(IMAGE)
buffered = BytesIO()

image.save(buffered, quality=100, format="JPEG")

img_str = base64.b64encode(buffered.getvalue())
img_str = img_str.decode("ascii")

data = {
"image": {
"type": "base64",
"value": img_str,
}
}

ocr_results = requests.post("http://localhost:9001/doctr/ocr?api_key=" + API_KEY, json=data).json()

print(ocr_results)
result = CLIENT.ocr_image(inference_input="./container.jpg") # single image request
print(result)
```

Above, replace `container.jpeg` with the path to the image in which you want to detect objects.

Then, run the Python script you have created:

```
python app.py
```

The results of DocTR will appear in your terminal:

```
Expand Down
50 changes: 15 additions & 35 deletions docs/foundation/gaze.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,39 +15,25 @@ L2CS-Net accepts an image and returns pitch and yaw values that you can use to:
1. Figure out the direction in which someone is looking, and;
2. Estimate, roughly, where someone is looking.

Create a new Python file and add the following code:
We recommend using L2CS-Net paired with inference HTTP API. It's easy to set up with our `inference-cli` tool. Run the
following command to set up environment and run the API under `http://localhost:9001`

```bash
pip install inference inference-cli inference-sdk
inference server start # this starts server under http://localhost:9001
```

```python
import base64

import cv2
import numpy as np
import requests
```python
import os
from inference_sdk import InferenceHTTPClient

CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001", # only local hosting supported
api_key=os.environ["ROBOFLOW_API_KEY"]
)

IMG_PATH = "image.jpg"
ROBOFLOW_API_KEY = os.environ["ROBOFLOW_API_KEY"]
DISTANCE_TO_OBJECT = 1000 # mm
HEIGHT_OF_HUMAN_FACE = 250 # mm
GAZE_DETECTION_URL = "http://127.0.0.1:9001/gaze/gaze_detection?api_key=" + ROBOFLOW_API_KEY

def detect_gazes(frame: np.ndarray):
img_encode = cv2.imencode(".jpg", frame)[1]
img_base64 = base64.b64encode(img_encode)
resp = requests.post(
GAZE_DETECTION_URL,
json={
"api_key": ROBOFLOW_API_KEY,
"image": {"type": "base64", "value": img_base64.decode("utf-8")},
},
)
# print(resp.json())
gazes = resp.json()[0]["predictions"]
return gazes

image = cv2.imread(IMG_PATH)
gazes = detect_gazes(image)
print(gazes)
CLIENT.detect_gazes(inference_input="./image.jpg") # single image request
```

Above, replace `image.jpg` with the image in which you want to detect gazes.
Expand All @@ -59,12 +45,6 @@ The code above makes two assumptions:

These assumptions are a good starting point if you are using a computer webcam with L2CS-Net, where people in the frame are likely to be sitting at a desk.

Then, run the Python script you have created:

```
python app.py
```

On the first run, the model will be downloaded. On subsequent runs, the model will be cached locally and loaded from the cache. It will take a few moments for the model to download.

The results of L2CS-Net will appear in your terminal:
Expand Down
Loading
Loading