The development of world models in robotics has long been a cornerstone of advanced research, with most approaches relying heavily on vast, platform-specific datasets. These datasets, while valuable, often limit scalability and generalization to different robotic platforms, restricting their broader applicability.
In contrast, CYBER approaches world modeling from a "first principles" perspective, drawing inspiration from how humans naturally acquire skills through experience and interaction with their environment. CYBER is the first general Robotic Operational System designed to adapt to both teleoperated manipulation and human operation data, enabling robots to learn and predict across a wide range of tasks and environments. It builds with a Physical World Model, a cross-embodied Visual-Language Action Model (VLA), a Perception Model, a Memory Model, and a Control Model to help robots learn, predict, and memory across various tasks and embodiments.
At the same time, CYBER also provide millions of human operation datasets and baseline models over HuggingFace 🤗 to enhance embodied learning, and experimental evalaution tool box to help researchers to test and evaluate their models in both simulation and real world.
- 🛠️ Modular: Built with a modular architecture, allowing flexibility in various environments.
- 📊 Data-Driven: Leverages millions of human operation datasets to enhance embodied learning.
- 📈 Scalable: Scales across different robotic platforms, adapting to new environments and tasks.
- 🔧 Customizable: Allows for customization and fine-tuning to meet specific requirements.
- 📚 Extensible: Supports the addition of new modules and functionalities, enhancing capabilities.
- 📦 Open Source: Open-source and freely available, fostering collaboration and innovation.
- 🔬 Experimental: Supports experimentation and testing, enabling continuous improvement.
CYBER is built with a modular architecture, allowing for flexibility and customization. Here are the key components:
- 🌍 World Model: Learns from physical interactions to understand and predict the environment.
- 🎬 Action Model: Learns from actions and interactions to perform tasks and navigate.
- 👁️ Perception Model: Processes sensory inputs to perceive and interpret surroundings.
- 🧠 Memory Model: Utilizes past experiences to inform current decisions.
- 🎮 Control Model: Manages control inputs for movement and interaction.
🌍 World Model is now available. Additional models will be released soon.
You will need Anaconda installed on your machine. If you don't have it installed, you can follow the installation instructions here.
You can run the following commands to install CYBER:
bash scripts/build.sh
Alternatively, you can install it manually by following the steps below:
-
Create a clean conda environment:
conda create -n cyber python=3.10 && conda activate cyber
-
Install PyTorch and torchvision:
conda install pytorch==2.3.0 torchvision==0.18.0 cudatoolkit=11.1 -c pytorch -c nvidia
-
Install the CYBER package:
pip install -e .
CYBER leverages the power of Hugging Face for model sharing and collaboration. You can easily access and use our models through the Hugging Face platform.
Currently, four tasks are available for download:
- 🤗 Pipette: Bimanual human demonstration dataset of precision pipetting tasks for laboratory manipulation.
- 🤗 Take Item: Single-arm manipulation demonstrations of object pick-and-place tasks.
- 🤗 Twist Tube: Bimanual demonstration dataset of coordinated tube manipulation sequences.
- 🤗 Fold Towels: Bimanual manipulation demonstrations of deformable object folding procedures.
Our pretrained models will be released on Hugging Face soon:
- Cyber-World-Large (Coming Soon)
- Cyber-World-Base(Coming Soon)
- Cyber-World-Small (Coming Soon)
Please refer to the experiments for more details on data downloading and model training.
├── ...
├── docs # documentation files and figures
├── docker # docker files for containerization
├── examples # example code snippets
├── tests # test cases and scripts
├── scripts # scripts for setup and utilities
├── experiments # model implementation and details
│ ├── configs # model configurations
│ ├── models # model training and evaluation scripts
│ ├── notebooks # sample notebooks
│ └── ...
├── cyber # compression, model training, and dataset source code
│ ├── dataset # dataset processing and loading
│ ├── utils # utility functions
│ └── models # model definitions and architectures
│ ├── action # visual language action model
│ ├── control # robot platform control model
│ ├── memory # lifelong memory model
│ ├── perception # perception and scene understanding model
│ ├── world # physical world model
│ └── ...
└── ...
Magvit2 and GENIE adapted from 1xGPT Challenge 1X Technologies. (2024). 1X World Model Challenge (Version 1.1) [Data set]
@inproceedings{wang2024hpt,
author = {Lirui Wang, Xinlei Chen, Jialiang Zhao, Kaiming He},
title = {Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers},
booktitle = {Neurips},
year = {2024}
}
@article{luo2024open,
title={Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation},
author={Luo, Zhuoyan and Shi, Fengyuan and Ge, Yixiao and Yang, Yujiu and Wang, Limin and Shan, Ying},
journal={arXiv preprint arXiv:2409.04410},
year={2024}
}
property | value | ||||
---|---|---|---|---|---|
name | CyberOrigin Dataset |
||||
url | https://github.com/CyberOrigin2077/Cyber |
||||
description | Cyber represents a model implementation that seamlessly integrates state-of-the-art (SOTA) world models with the proposed CyberOrigin Dataset, pushing the boundaries of artificial intelligence and machine learning. |
||||
provider |
|
||||
license |
|
If you have technical questions, please open a GitHub issue. For business development or other collaboration inquiries, feel free to contact us through email 📧 ([email protected]). Enjoy! 🎉