Skip to content

Commit

Permalink
Merge pull request #184 from dusty-nv/20240727-ultralytics
Browse files Browse the repository at this point in the history
updated pages
  • Loading branch information
dusty-nv authored Jul 28, 2024
2 parents 5c1f5cc + ea7b60c commit 84cbedf
Show file tree
Hide file tree
Showing 6 changed files with 95 additions and 39 deletions.
4 changes: 3 additions & 1 deletion docs/agent_studio.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,8 @@ Below are descriptions of commonly-used components. Help text for these is extra
* <text>
* <text>
* <text>
These most recent inputs are used in newest to oldest order from a LIFO queue.
```

=== "UserPrompt"
Expand Down Expand Up @@ -303,4 +305,4 @@ Many of the previous demos (like Llamaspeak and Live Llava) can quickly be recre
* <text>
```

🤖 Have fun bot building! If you need help, reach out on the [Jetson Forums](https://forums.developer.nvidia.com/c/agx-autonomous-machines/jetson-embedded-systems/jetson-projects/78) or [GitHub Issues](https://github.com/dusty-nv/NanoLLM).
🤖 Have fun bot building! If you need help, reach out on the [Jetson Forums](https://forums.developer.nvidia.com/c/agx-autonomous-machines/jetson-embedded-systems/jetson-projects/78) or [GitHub Issues](https://github.com/dusty-nv/NanoLLM).
2 changes: 1 addition & 1 deletion docs/tutorial_api-examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ The [`huggingface-benchmark.py`](https://github.com/dusty-nv/jetson-containers/b

## NanoLLM

The [`NanoLLM`](https://dusty-nv.github.io/NanoLLM) library uses the optimized MLC/TVM library for inference, like on the [Benchmarks](benchmarks.md) page:
The [`NanoLLM`](tutorial_nano-llm.md) library uses the optimized MLC/TVM library for inference, like on the [Benchmarks](benchmarks.md) page:

<a href="benchmarks.html"><iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vTJ9lFqOIZSfrdnS_0sa2WahzLbpbAbBCTlS049jpOchMCum1hIk-wE_lcNAmLkrZd0OQrI9IkKBfGp/pubchart?oid=2126319913&amp;format=interactive"></iframe></a>

Expand Down
11 changes: 5 additions & 6 deletions docs/tutorial_live-llava.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,10 +96,6 @@ jetson-containers run $(autotag nano_llm) \

You can also tag incoming images and add them to the database using the web UI, for one-shot recognition tasks:

<div><iframe width="500" height="280" src="https://www.youtube.com/embed/8Eu6zG0eEGY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

<iframe width="500" height="280" src="https://www.youtube.com/embed/wZq7ynbgRoE" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe></div>

## Video VILA

The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action & behavior analysis, change detection, and other temporal-based vision functions. The [`vision/video.py`](https://github.com/dusty-nv/NanoLLM/blob/main/nano_llm/vision/video.py){:target="_blank"} example keeps a rolling history of frames:
Expand All @@ -117,8 +113,6 @@ jetson-containers run $(autotag nano_llm) \

<a href="https://youtu.be/_7gughth8C0" target="_blank"><img src="images/video_vila_wildfire.gif" title="Link to YouTube video of more clips (Realtime Video Vision/Language Model with VILA1.5-3b and Jetson Orin)"></a>

<small>Note: support will be added to the web UI for continuous multi-image queries on video sequences.</small>

## Python Code

For a simplified code example of doing live VLM streaming from Python, see [here](https://dusty-nv.github.io/NanoLLM/multimodal.html#code-example){:target="_blank"} in the NanoLLM docs.
Expand All @@ -127,3 +121,8 @@ For a simplified code example of doing live VLM streaming from Python, see [here

You can use this to implement customized prompting techniques and integrate with other vision pipelines. This code applies the same set of prompts to the latest image from the video feed. See [here](https://github.com/dusty-nv/NanoLLM/blob/main/nano_llm/vision/video.py){:target="_blank"} for the version that does multi-image queries on video sequences.

## Walkthrough Videos

<div><iframe width="500" height="280" src="https://www.youtube.com/embed/wZq7ynbgRoE" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<iframe width="500" height="280" src="https://www.youtube.com/embed/8Eu6zG0eEGY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe></div>

45 changes: 33 additions & 12 deletions docs/tutorial_nano-llm.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
# NanoLLM - Optimized LLM Inference

[`NanoLLM`](https://dusty-nv.github.io/NanoLLM){:target="_blank"} is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM’s, multimodality, speech services, vector databases with RAG, and web frontends. It's used to build many of the responsive, low-latency agents featured on this site.
[`NanoLLM`](https://dusty-nv.github.io/NanoLLM){:target="_blank"} is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM’s, multimodality, speech services, vector databases with RAG, and web frontends like [Agent Studio](agent_studio.md).

<a href="https://dusty-nv.github.io/NanoLLM" target="_blank"><img src="./images/nano_llm_docs.jpg" style="max-width: 50%; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.4);"></img></a>
<video controls autoplay muted style="max-width: 75%">
<source src="images/agent_studio.mp4" type="video/mp4">
</video>

It provides <a href="tutorial_api-examples.html#nanollm" target="_blank">similar APIs</a> to HuggingFace, backed by highly-optimized inference libraries and quantization tools:

Expand Down Expand Up @@ -30,14 +33,29 @@ To test a chat session with Llama from the command-line, install [`jetson-contai
git clone https://github.com/dusty-nv/jetson-containers
bash jetson-containers/install.sh
```
```
jetson-containers run \
--env HUGGINGFACE_TOKEN=hf_abc123def \
$(autotag nano_llm) \
python3 -m nano_llm.chat --api mlc \
--model meta-llama/Meta-Llama-3-8B-Instruct \
--prompt "Can you tell me a joke about llamas?"
```

=== "Llama CLI"

```bash
jetson-containers run \
--env HUGGINGFACE_TOKEN=hf_abc123def \
$(autotag nano_llm) \
python3 -m nano_llm.chat --api mlc \
--model meta-llama/Meta-Llama-3-8B-Instruct \
--prompt "Can you tell me a joke about llamas?"
```

=== "Agent Studio"

```
jetson-containers run \
--env HUGGINGFACE_TOKEN=hf_abc123def \
$(autotag nano_llm) \
python3 -m nano_llm.studio
```




If you haven't already, request access to the [Llama models](https://huggingface.co/meta-llama){:target="_blank"} on HuggingFace and substitute your account's API token above.

Expand All @@ -54,12 +72,15 @@ Here's an index of the various tutorials & examples using NanoLLM on Jetson AI L
| **[Small LLM (SLM)](./tutorial_slm.md){:target="_blank"}** | Focus on language models with reduced footprint (7B params and below) |
| **[Live LLaVA](./tutorial_live-llava.md){:target="_blank"}** | Realtime live-streaming vision/language models on recurring prompts. |
| **[Nano VLM](./tutorial_nano-vlm.md){:target="_blank"}** | Efficient multimodal pipeline with one-shot image tagging and RAG support. |

| **[Agent Studio](./agent_studio.md){:target="_blank"}** | Rapidly design and experiment with creating your own automation agents. |

<div><iframe width="500" height="280" src="https://www.youtube.com/embed/UOjqF3YCGkY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<iframe width="500" height="280" src="https://www.youtube.com/embed/8Eu6zG0eEGY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<iframe width="500" height="280" src="https://www.youtube.com/embed/wZq7ynbgRoE" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</div>
<div><iframe width="500" height="280" src="https://www.youtube.com/embed/hswNSZTvEFE" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<iframe width="500" height="280" src="https://www.youtube.com/embed/wZq7ynbgRoE" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<iframe width="500" height="280" src="https://www.youtube.com/embed/_7gughth8C0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<iframe width="500" height="280" src="https://www.youtube.com/embed/8Eu6zG0eEGY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<iframe width="500" height="280" src="https://www.youtube.com/embed/9ozwh9EDGhU" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

</div>

3 changes: 1 addition & 2 deletions docs/tutorial_nano-vlm.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,8 +144,7 @@ jetson-containers run $(autotag nano_llm) \
```

<iframe width="720" height="405" src="https://www.youtube.com/embed/_7gughth8C0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<small>Note: support will be added to the web UI for continuous multi-image queries on video sequences and is WIP.</small>


## Python Code

For a simplified code example of doing live VLM streaming from Python, see [here](https://dusty-nv.github.io/NanoLLM/multimodal.html#code-example){:target="_blank"} in the NanoLLM docs.
Expand Down
69 changes: 52 additions & 17 deletions docs/tutorial_ultralytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,28 +28,26 @@ Let's run [Ultralytics](https://www.ultralytics.com) YOLOv8 on Jetson with [NVID

Execute the below commands according to the JetPack version to pull the corresponding Docker container and run on Jetson.

!!! Setup
=== "JetPack 4"

=== "JetPack 4"
```bash
t=ultralytics/ultralytics:latest-jetson-jetpack4
sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
```

```bash
t=ultralytics/ultralytics:latest-jetson-jetpack4
sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
```
=== "JetPack 5"

=== "JetPack 5"

```bash
t=ultralytics/ultralytics:latest-jetson-jetpack5
sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
```
```bash
t=ultralytics/ultralytics:latest-jetson-jetpack5
sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
```

=== "JetPack 6"
=== "JetPack 6"

```bash
t=ultralytics/ultralytics:latest-jetson-jetpack6
sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
```
```bash
t=ultralytics/ultralytics:latest-jetson-jetpack6
sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
```

## Convert model to TensorRT and run inference

Expand Down Expand Up @@ -94,6 +92,43 @@ The YOLOv8n model in PyTorch format is converted to TensorRT to run inference wi

Visit the [Export page](https://docs.ultralytics.com/modes/export) to access additional arguments when exporting models to different model formats. Note that the default arguments require inference using fixed image dimensions when `dynamic=False`. To change the input source for inference, please refer to [Model Prediction](https://docs.ultralytics.com/modes/predict/#inference-sources) page.

## Benchmarks

Benchmarks of the YOLOv8 variants with TensorRT were run by [Seeed Studio](https://www.seeedstudio.com/blog/2023/03/30/yolov8-performance-benchmarks-on-nvidia-jetson-devices/) on their [reComputer](https://www.seeedstudio.com/nvidia-jetson.html) systems:

<img src="https://www.seeedstudio.com/blog/wp-content/uploads/2023/03/image-26-1030x611.png" width="100%">

=== "Xavier NX 8GB"

| Model | PyTorch | FP32 | FP16 | INT8 |
|---------|:-------:|:----:|:----:|:----:|
| YOLOv8n | 32 | 63 | 120 | 167 |
| YOLOv8s | 25 | 26 | 69 | 112 |
| YOLOv8m | 11 | 11 | 33 | 56 |
| YOLOv8l | 6 | 6 | 20 | 38 |

=== "Orin NX 8GB"

| Model | PyTorch | FP32 | FP16 | INT8 |
|---------|:-------:|:----:|:----:|:----:|
| YOLOv8n | 56 | 115 | 204 | 256 |
| YOLOv8s | 53 | 67 | 128 | 196 |
| YOLOv8m | 26 | 31 | 63 | 93 |
| YOLOv8l | 16 | 20 | 42 | 69 |

=== "AGX Orin 32GB"

| Model | PyTorch | FP32 | FP16 | INT8 |
|---------|:-------:|:----:|:----:|:----:|
| YOLOv8n | 77 | 192 | 323 | 385 |
| YOLOv8s | 67 | 119 | 213 | 303 |
| YOLOv8m | 40 | 56 | 105 | 145 |
| YOLOv8l | 27 | 38 | 73.5 | 114 |


* FP32/FP16/INT8 with TensorRT (frames per second)
* Original post with the benchmarks are found [here](https://www.seeedstudio.com/blog/2023/03/30/yolov8-performance-benchmarks-on-nvidia-jetson-devices/)

## Further reading

To learn more, visit our [comprehensive guide on running Ultralytics YOLOv8 on NVIDIA Jetson](https://docs.ultralytics.com/guides/nvidia-jetson) including benchmarks!
Expand Down

0 comments on commit 84cbedf

Please sign in to comment.