Merge pull request #184 from dusty-nv/20240727-ultralytics

updated pages
NVIDIA-AI-IOT · Jul 28, 2024 · 84cbedf · 84cbedf
2 parents 5c1f5cc + ea7b60c
commit 84cbedf
Show file tree

Hide file tree

Showing 6 changed files with 95 additions and 39 deletions.
diff --git a/docs/agent_studio.md b/docs/agent_studio.md
@@ -152,6 +152,8 @@ Below are descriptions of commonly-used components. Help text for these is extra
           * <text>
           * <text>
           * <text> 
+          
+    These most recent inputs are used in newest to oldest order from a LIFO queue.
     ```
 
 === "UserPrompt"
@@ -303,4 +305,4 @@ Many of the previous demos (like Llamaspeak and Live Llava) can quickly be recre
        * <text>
    ```
 
-🤖 Have fun bot building!  If you need help, reach out on the [Jetson Forums](https://forums.developer.nvidia.com/c/agx-autonomous-machines/jetson-embedded-systems/jetson-projects/78) or [GitHub Issues](https://github.com/dusty-nv/NanoLLM).
+🤖 Have fun bot building!  If you need help, reach out on the [Jetson Forums](https://forums.developer.nvidia.com/c/agx-autonomous-machines/jetson-embedded-systems/jetson-projects/78) or [GitHub Issues](https://github.com/dusty-nv/NanoLLM).
diff --git a/docs/tutorial_api-examples.md b/docs/tutorial_api-examples.md
@@ -83,7 +83,7 @@ The [`huggingface-benchmark.py`](https://github.com/dusty-nv/jetson-containers/b
 
 ## NanoLLM
 
-The [`NanoLLM`](https://dusty-nv.github.io/NanoLLM) library uses the optimized MLC/TVM library for inference, like on the [Benchmarks](benchmarks.md) page:
+The [`NanoLLM`](tutorial_nano-llm.md) library uses the optimized MLC/TVM library for inference, like on the [Benchmarks](benchmarks.md) page:
 
 <a href="benchmarks.html"><iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vTJ9lFqOIZSfrdnS_0sa2WahzLbpbAbBCTlS049jpOchMCum1hIk-wE_lcNAmLkrZd0OQrI9IkKBfGp/pubchart?oid=2126319913&amp;format=interactive"></iframe></a>
 

diff --git a/docs/tutorial_live-llava.md b/docs/tutorial_live-llava.md
@@ -96,10 +96,6 @@ jetson-containers run $(autotag nano_llm) \
 
 You can also tag incoming images and add them to the database using the web UI, for one-shot recognition tasks:
 
-<div><iframe width="500" height="280" src="https://www.youtube.com/embed/8Eu6zG0eEGY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
-
-<iframe width="500" height="280" src="https://www.youtube.com/embed/wZq7ynbgRoE" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe></div>
-
 ## Video VILA
 
 The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action & behavior analysis, change detection, and other temporal-based vision functions.  The [`vision/video.py`](https://github.com/dusty-nv/NanoLLM/blob/main/nano_llm/vision/video.py){:target="_blank"} example keeps a rolling history of frames:
@@ -117,8 +113,6 @@ jetson-containers run $(autotag nano_llm) \
 
 <a href="https://youtu.be/_7gughth8C0" target="_blank"><img src="images/video_vila_wildfire.gif" title="Link to YouTube video of more clips (Realtime Video Vision/Language Model with VILA1.5-3b and Jetson Orin)"></a>
 
-<small>Note:  support will be added to the web UI for continuous multi-image queries on video sequences.</small>
-
 ## Python Code
 
 For a simplified code example of doing live VLM streaming from Python, see [here](https://dusty-nv.github.io/NanoLLM/multimodal.html#code-example){:target="_blank"} in the NanoLLM docs. 
@@ -127,3 +121,8 @@ For a simplified code example of doing live VLM streaming from Python, see [here
 
 You can use this to implement customized prompting techniques and integrate with other vision pipelines.  This code applies the same set of prompts to the latest image from the video feed.  See [here](https://github.com/dusty-nv/NanoLLM/blob/main/nano_llm/vision/video.py){:target="_blank"} for the version that does multi-image queries on video sequences.
 
+## Walkthrough Videos
+
+<div><iframe width="500" height="280" src="https://www.youtube.com/embed/wZq7ynbgRoE" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+<iframe width="500" height="280" src="https://www.youtube.com/embed/8Eu6zG0eEGY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe></div>
+
diff --git a/docs/tutorial_nano-llm.md b/docs/tutorial_nano-llm.md
@@ -1,8 +1,11 @@
 # NanoLLM - Optimized LLM Inference
 
-[`NanoLLM`](https://dusty-nv.github.io/NanoLLM){:target="_blank"} is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM’s, multimodality, speech services, vector databases with RAG, and web frontends. It's used to build many of the responsive, low-latency agents featured on this site.
+[`NanoLLM`](https://dusty-nv.github.io/NanoLLM){:target="_blank"} is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM’s, multimodality, speech services, vector databases with RAG, and web frontends like [Agent Studio](agent_studio.md).
 
 <a href="https://dusty-nv.github.io/NanoLLM" target="_blank"><img src="./images/nano_llm_docs.jpg" style="max-width: 50%; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.4);"></img></a>
+<video controls autoplay muted style="max-width: 75%">
+    <source src="images/agent_studio.mp4" type="video/mp4">
+</video>
 
 It provides <a href="tutorial_api-examples.html#nanollm" target="_blank">similar APIs</a> to HuggingFace, backed by highly-optimized inference libraries and quantization tools:
 
@@ -30,14 +33,29 @@ To test a chat session with Llama from the command-line, install [`jetson-contai
 git clone https://github.com/dusty-nv/jetson-containers
 bash jetson-containers/install.sh
 ```
-```	
-jetson-containers run \
-  --env HUGGINGFACE_TOKEN=hf_abc123def \
-  $(autotag nano_llm) \
-  python3 -m nano_llm.chat --api mlc \
-    --model meta-llama/Meta-Llama-3-8B-Instruct \
-    --prompt "Can you tell me a joke about llamas?"
-```
+
+=== "Llama CLI"
+
+    ```bash
+    jetson-containers run \
+      --env HUGGINGFACE_TOKEN=hf_abc123def \
+      $(autotag nano_llm) \
+        python3 -m nano_llm.chat --api mlc \
+          --model meta-llama/Meta-Llama-3-8B-Instruct \
+          --prompt "Can you tell me a joke about llamas?"
+    ```
+
+=== "Agent Studio"
+
+    ```	
+    jetson-containers run \
+      --env HUGGINGFACE_TOKEN=hf_abc123def \
+      $(autotag nano_llm) \
+        python3 -m nano_llm.studio
+    ```
+
+
+
 
 If you haven't already, request access to the [Llama models](https://huggingface.co/meta-llama){:target="_blank"} on HuggingFace and substitute your account's API token above.
 
@@ -54,12 +72,15 @@ Here's an index of the various tutorials & examples using NanoLLM on Jetson AI L
 | **[Small LLM (SLM)](./tutorial_slm.md){:target="_blank"}** | Focus on language models with reduced footprint (7B params and below) |
 | **[Live LLaVA](./tutorial_live-llava.md){:target="_blank"}** | Realtime live-streaming vision/language models on recurring prompts. |
 | **[Nano VLM](./tutorial_nano-vlm.md){:target="_blank"}** | Efficient multimodal pipeline with one-shot image tagging and RAG support. |
-
+| **[Agent Studio](./agent_studio.md){:target="_blank"}** | Rapidly design and experiment with creating your own automation agents. |
 
 <div><iframe width="500" height="280" src="https://www.youtube.com/embed/UOjqF3YCGkY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
-<iframe width="500" height="280" src="https://www.youtube.com/embed/8Eu6zG0eEGY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+<iframe width="500" height="280" src="https://www.youtube.com/embed/wZq7ynbgRoE" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
 </div>
 <div><iframe width="500" height="280" src="https://www.youtube.com/embed/hswNSZTvEFE" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
-<iframe width="500" height="280" src="https://www.youtube.com/embed/wZq7ynbgRoE" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+<iframe width="500" height="280" src="https://www.youtube.com/embed/_7gughth8C0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+<iframe width="500" height="280" src="https://www.youtube.com/embed/8Eu6zG0eEGY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+<iframe width="500" height="280" src="https://www.youtube.com/embed/9ozwh9EDGhU" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+
 </div>  
 
diff --git a/docs/tutorial_nano-vlm.md b/docs/tutorial_nano-vlm.md
@@ -144,8 +144,7 @@ jetson-containers run $(autotag nano_llm) \
 ```
 
 <iframe width="720" height="405" src="https://www.youtube.com/embed/_7gughth8C0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
-<small>Note:  support will be added to the web UI for continuous multi-image queries on video sequences and is WIP.</small>
-
+
 ## Python Code
 
 For a simplified code example of doing live VLM streaming from Python, see [here](https://dusty-nv.github.io/NanoLLM/multimodal.html#code-example){:target="_blank"} in the NanoLLM docs. 

diff --git a/docs/tutorial_ultralytics.md b/docs/tutorial_ultralytics.md
@@ -28,28 +28,26 @@ Let's run [Ultralytics](https://www.ultralytics.com) YOLOv8 on Jetson with [NVID
 
 Execute the below commands according to the JetPack version to pull the corresponding Docker container and run on Jetson.
 
-!!! Setup
+=== "JetPack 4"
 
-    === "JetPack 4"
+    ```bash
+    t=ultralytics/ultralytics:latest-jetson-jetpack4
+    sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
+    ```
 
-        ```bash
-        t=ultralytics/ultralytics:latest-jetson-jetpack4
-        sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
-        ```
+=== "JetPack 5"
 
-    === "JetPack 5"
-
-        ```bash
-        t=ultralytics/ultralytics:latest-jetson-jetpack5
-        sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
-        ```
+    ```bash
+    t=ultralytics/ultralytics:latest-jetson-jetpack5
+    sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
+    ```
 
-    === "JetPack 6"
+=== "JetPack 6"
 
-        ```bash
-        t=ultralytics/ultralytics:latest-jetson-jetpack6
-        sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
-        ```
+    ```bash
+    t=ultralytics/ultralytics:latest-jetson-jetpack6
+    sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
+    ```
 
 ## Convert model to TensorRT and run inference
 
@@ -94,6 +92,43 @@ The YOLOv8n model in PyTorch format is converted to TensorRT to run inference wi
 
     Visit the [Export page](https://docs.ultralytics.com/modes/export) to access additional arguments when exporting models to different model formats. Note that the default arguments require inference using fixed image dimensions when `dynamic=False`. To change the input source for inference, please refer to [Model Prediction](https://docs.ultralytics.com/modes/predict/#inference-sources) page. 
 
+## Benchmarks
+
+Benchmarks of the YOLOv8 variants with TensorRT were run by [Seeed Studio](https://www.seeedstudio.com/blog/2023/03/30/yolov8-performance-benchmarks-on-nvidia-jetson-devices/) on their [reComputer](https://www.seeedstudio.com/nvidia-jetson.html) systems:
+
+<img src="https://www.seeedstudio.com/blog/wp-content/uploads/2023/03/image-26-1030x611.png" width="100%">
+
+=== "Xavier NX 8GB"
+
+    | Model   | PyTorch | FP32 | FP16 | INT8 |
+    |---------|:-------:|:----:|:----:|:----:|
+    | YOLOv8n |    32   |  63  |  120 |  167 |
+    | YOLOv8s |    25   |  26  |  69  |  112 |
+    | YOLOv8m |    11   |  11  |  33  |  56  |
+    | YOLOv8l |    6    |   6  |  20  |  38  |
+
+=== "Orin NX 8GB"
+
+    | Model   | PyTorch | FP32 | FP16 | INT8 |
+    |---------|:-------:|:----:|:----:|:----:|
+    | YOLOv8n |    56   |  115 |  204 |  256 |
+    | YOLOv8s |    53   |  67  |  128 |  196 |
+    | YOLOv8m |    26   |  31  |  63  |  93  |
+    | YOLOv8l |    16   |  20  |  42  |  69  |
+
+=== "AGX Orin 32GB"
+
+    | Model   | PyTorch | FP32 | FP16 | INT8 |
+    |---------|:-------:|:----:|:----:|:----:|
+    | YOLOv8n |    77   |  192 |  323 |  385 |
+    | YOLOv8s |    67   |  119 |  213 |  303 |
+    | YOLOv8m |    40   |  56  |  105 |  145 |
+    | YOLOv8l |    27   |  38  | 73.5 |  114 |
+
+
+* FP32/FP16/INT8 with TensorRT (frames per second)
+* Original post with the benchmarks are found [here](https://www.seeedstudio.com/blog/2023/03/30/yolov8-performance-benchmarks-on-nvidia-jetson-devices/)
+
 ## Further reading
 
 To learn more, visit our [comprehensive guide on running Ultralytics YOLOv8 on NVIDIA Jetson](https://docs.ultralytics.com/guides/nvidia-jetson) including benchmarks!