Cyber represents a model implementation that seamlessly integrates state-of-the-art (SOTA) world models with the proposed CyberOrigin Dataset, pushing the boundaries of artificial intelligence and machine learning.
Follow this document to train the models using our readily-available data or adapt your data for training.
Our data includes information from home services, the logistics industry, and laboratory scenarios. For more details, please refer to our Offical Data Website.
-
Format & Description
Currently, the dataset contains image tokens generated by Magvit2. For more information, please refer to the dataset card on Huggingface. -
Download the Dataset
The dataset is currently available on Huggingface.- 🤗 Pipette
- 🤗 Take Item
- 🤗 Twist Tube
- 🤗 Fold Towels
You can download the dataset using the following command:
bash ../scripts/download_dataset.sh
- Visualize the Dataset
You can visualize the dataset using this notebook. Make sure to install the jupyter before running the notebook.pip install jupyter notebook
bash ../scripts/download_dataset.sh
python models/world/train_world.py --train-config-path configs/models/world/world_model.yaml --model-config-path configs/models/world/MagVIT_Genie.yaml
Note: The model will train on the default configuration provided.
The code is adapted from 1x's implementation of GENIE. The model is based on an ST-transformer architecture that predicts the next frame given the previous frames.
Model parameters tuning
The detailed configuration file is provided in the configs/models/world
folder.
{
"num_layers": 32, // number of ST-transformer blocks
"num_heads": 8, // number of heads in multi-head attention
"d_model": 256, // dimension of the model latent
"T": 16, // number of frames in the input sequence
"S": 256, // number of tokens in the input sequence S=16x16
"image_vocab_size": 262144, // codebook size for the image tokens
"use_mup": false, // whether to use MUP
"num_factored_vocabs": 2, // number of factored vocabularies
"qkv_bias": false, // whether to use bias in qkv projection
"proj_bias": true, // whether to use bias in projection
"attn_drop": 0, // dropout rate in attention
"qk_norm": false, // whether to normalize qk
"mlp_ratio": 4, // ratio of hidden size to model latent size in MLP
"mlp_drop": 0, // dropout rate in MLP
"mlp_bias": true // whether to use bias in MLP
}
It is recommended to only modify the first three parameters to adjust model size.
Training parameters tuning
Please refer to the help message for hyperparameter descriptions
python models/world/train.py -h
Code is modified from 1XGPT and Open-MAGVIT2 but removed unnecessary files and code.
Pretrained checkpoint
Download the checkpoint HERE Or run the command:
huggingface-cli download 1x-technologies/worldmodel magvit2.ckpt --repo-type dataset --local-dir ./experiments/
Try with our provided samples
We provide the notebook you can try to compress and decompress your video. Please try autoencoder_demo.ipynb and follow the instructions.
Compress your video data
root-folder
└── subfolder1/
├── color(videos inside)
├── 1b17c56e-02acc10001.mp4
├── 0ce6190f-02acc11002.mp4
├── ...
├── ...
└── subfolder2/
├── ...
Make sure your datasets are the same as the structure above.
python experiments/compress_video.py --root_path /path/to/root/folder --ckpt_path experiments/magvit2.ckpt --output_path /path/to/output/folder
videos.bin
metadata.json
segment_ids.bin
will be generated in output_path/date_folder/compressed
, you can decompress it and check the reconstructed video.