CVAF is a computer-vision based automation framework. It allows you to programatically spawn/control/kill a desktop environment. Additonally, it has a vision system that handles open-vocabulary pixel coordinate prediction for UI elements.
Prerequisite: Install Docker
-
Clone the repository:
git clone https://github.com/sampagon/CVAF.git cd CVAF
-
Install dependencies:
Only tested on Python 3.11 so farpip install -r requirements.txt
This may take longer than expected on the first run because the desktop environment image and vision system need to be downloaded from dockerhub and huggingface, respectively.
python test.py