Skip to content

Commit

Permalink
Merge pull request #249 from johnnynunez/main
Browse files Browse the repository at this point in the history
cosmos documentation
  • Loading branch information
dusty-nv authored Jan 15, 2025
2 parents 2fc2f0c + ea99d49 commit a09eb8a
Show file tree
Hide file tree
Showing 6 changed files with 149 additions and 6 deletions.
141 changes: 141 additions & 0 deletions docs/cosmos.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Cosmos - World Foundation Models

[Cosmos](https://github.com/NVIDIA/Cosmos) is a world model development platform that consists of world foundation
models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs.
Cosmos is purpose built for physical AI. The Cosmos repository will enable end users to run the Cosmos models, run
inference scripts and generate videos.

<img src="images/cosmos_jetson.jpg" style="max-width:800px;">

> Special thanks to [Johnny Núñez Cano](https://www.linkedin.com/in/johnnycano/) for porting the Cosmos and Transformer
> Engine Jetson!
> See [Cosmos Official page](https://www.nvidia.com/en-us/ai/cosmos/) by Nvidia.
> See [Transformer Engine](https://github.com/NVIDIA/TransformerEngine) by Nvidia.
!!! abstract "What you need"

1. One of the following Jetson devices:

<span class="blobDarkGreen4">Jetson Thor (XGB)</span>
<span class="blobDarkGreen4">Jetson AGX Orin (64GB)</span>
<span class="blobDarkGreen5">Jetson AGX Orin (32GB)</span>

2. Running one of the following versions of [JetPack](https://developer.nvidia.com/embedded/jetpack):

<span class="blobPink2">JetPack 6 (L4T r36.x)</span>

3. Sufficient storage space (preferably with NVMe SSD).

- `12.26GB` for [`cosmos`](https://hub.docker.com/r/dustynv/cosmos) container image
- Space for models and datasets (`>50GB`)
4. Clone and setup [`jetson-containers`](https://github.com/dusty-nv/jetson-containers/blob/master/docs/setup.md){:target="_blank"}:

```bash
git clone https://github.com/dusty-nv/jetson-containers
bash jetson-containers/install.sh
```

!!! abstract "WARNING"
[Transformer Engine](https://github.com/NVIDIA/TransformerEngine):

- Cosmos is optimized for NVIDIA ADA GPU architecture generations and later due running in FP8.
- Jetson AGX Orin is based on Ampere.
- Support for optimizations across all precisions (FP16, BF16) on NVIDIA Ampere GPU architecture generations and later.

## Start Container

Use this command to automatically run, build, or pull a compatible container image for cosmos:

```bash
jetson-containers run $(autotag cosmos)
```

To mount your own directories into the container, use the [
`-v`](https://docs.docker.com/engine/reference/commandline/run/#volume) or [
`--volume`](https://docs.docker.com/engine/reference/commandline/run/#volume) flags:

```bash
jetson-containers run -v /path/on/host:/path/in/container $(autotag cosmos)
```

Recommendation (This download all models outside docker container):

```bash
git clone --recursive https://github.com/NVIDIA/Cosmos.git
cd Cosmos
jetson-containers run -it -v $(pwd):/workspace $(autotag cosmos)
```

## Follow the instructions from Cosmos repository.

Here is the summarized steps to run the Cosmos models:

Generate a [Hugging Face](https://huggingface.co/settings/tokens) access token. Set the access token to 'Read'
permission (default is 'Fine-grained').

```bash
huggingface-cli login
```

Download Models:

```bash
PYTHONPATH=$(pwd) python3 cosmos1/scripts/download_diffusion.py --model_sizes 7B 14B --model_types Text2World Video2World
```

Run Demo:

```bash
PROMPT="A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. \
The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. \
A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, \
suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. \
The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of \
field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."
```

```bash
PYTHONPATH=$(pwd) python3 cosmos1/models/diffusion/inference/text2world.py \
--checkpoint_dir checkpoints \
--diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \
--prompt "$PROMPT" \
--video_save_name Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient \
--offload_tokenizer \
--offload_diffusion_transformer \
--offload_text_encoder_model \
--offload_prompt_upsampler \
--offload_guardrail_models
```

It will generate a video file in the `outputs` directory.

<video controls autoplay muted style="max-width: 75%">
<source src="images/text2world_example.mp4" type="video/mp4">
</video>

Another example:

```bash
PROMPT="The video showcases a vibrant, magical garden where flowers bloom dynamically, opening and moving as though responding to a gentle rhythm in nature. \
Colorful butterflies glide gracefully through the air, and a small, clear stream winds its way through the scene, reflecting the warm glow of sunlight. \
A curious rabbit hops along a winding path, leading the viewer to a hidden alcove where a tree with golden, shimmering leaves stands, its branches moving slightly as if alive with energy. \
The entire scene radiates tranquility and wonder, inviting viewers to immerse themselves in the beauty of nature and magic combined."
```

```bash
PYTHONPATH=$(pwd) python3 cosmos1/models/diffusion/inference/text2world.py \
--checkpoint_dir checkpoints \
--diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \
--prompt "$PROMPT" \
--video_save_name Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient \
--offload_tokenizer \
--offload_diffusion_transformer \
--offload_text_encoder_model \
--offload_prompt_upsampler \
--offload_guardrail_models
```

<video controls autoplay muted style="max-width: 75%">
<source src="images/Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient.mp4" type="video/mp4">
</video>
Binary file not shown.
Binary file added docs/images/cosmos_jetson.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/text2world_example.mp4
Binary file not shown.
13 changes: 7 additions & 6 deletions docs/tutorial-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,13 @@ Give your locally running LLM an access to vision!

### Image Generation

| | |
| :---------- | :----------------------------------- |
| **[Flux + ComfyUI](./tutorial_comfyui_flux.md)** | Set up and run the ComfyUI with Flux model for image generation on Jetson Orin. |
| **[Stable Diffusion](./tutorial_stable-diffusion.md)** | Run AUTOMATIC1111's [`stable-diffusion-webui`](https://github.com/AUTOMATIC1111/stable-diffusion-webui) to generate images from prompts |
| **[SDXL](./tutorial_stable-diffusion-xl.md)** | Ensemble pipeline consisting of a base model and refiner with enhanced image generation. |
| **[nerfstudio](./nerf.md)** | Experience neural reconstruction and rendering with nerfstudio and onboard training. |
| | |
|:-------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------|
| **[Cosmos](./cosmos.md)** | Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. |
| **[Flux + ComfyUI](./tutorial_comfyui_flux.md)** | Set up and run the ComfyUI with Flux model for image generation on Jetson Orin. |
| **[Stable Diffusion](./tutorial_stable-diffusion.md)** | Run AUTOMATIC1111's [`stable-diffusion-webui`](https://github.com/AUTOMATIC1111/stable-diffusion-webui) to generate images from prompts |
| **[SDXL](./tutorial_stable-diffusion-xl.md)** | Ensemble pipeline consisting of a base model and refiner with enhanced image generation. |
| **[nerfstudio](./nerf.md)** | Experience neural reconstruction and rendering with nerfstudio and onboard training. |


### Audio
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ nav:
- ROS2 Nodes: ros.md
- OpenVLA: openvla.md
- Image Generation:
- Cosmos: cosmos.md
- Flux & ComfyUI: tutorial_comfyui_flux.md
- Stable Diffusion: tutorial_stable-diffusion.md
- Stable Diffusion XL: tutorial_stable-diffusion-xl.md
Expand Down

0 comments on commit a09eb8a

Please sign in to comment.