Skip to content

Art2Mus is a system that generates music based on digitized artworks and text by using the AudioLDM2 architecture with an added projection layer for integrating visual and textual inputs.

Notifications You must be signed in to change notification settings

justivanr/art2mus_

Repository files navigation

Art2Mus

Logo

python arXiv

SciPy Python Pandas PyTorch scikit-learn CUDA

Introduction 🚀

Within this repository Art2Mus, an Artwork-Based Music Generation System, is proposed. Art2Mus leverages the AudioLDM2 architecture with an additional projection layer that enables digitized artworks to be used as conditioning information to guide the music generation process alongside text.

ImageBind is used to generate image embeddings. The scripts used to compute the Fréchet Audio Distance (FAD) score during training are taken from this repo.

Note

If you encounter any problems, we kindly ask you to open an issue summarizing the problem and, if possible, suggesting a solution if you have one.

Installation of Requirements 🤖

Tip

We highly recommend using a virtual environment for installing the required packages. If your Python version is not 3.10.12, consider using a conda virtual environment.

Before installing the requirements, update pip in your virtual environment to avoid potential installation issues. Use the following command to update pip:

python.exe -m pip install --upgrade pip

Once pip is updated, install the necessary libraries listed in the requirements.txt file to ensure the proper functioning of the model and the visualization of results. You can install them with the following command:

pip install -r requirements.txt

Note

The final step may take some time due to the installation of various libraries.

Run Art2Mus 🖼️🎵

Important

The code should work regardless of whether CUDA is installed on your machine. However, please note that inference will take longer on a CPU compared to a GPU.

Within the repository, you can find the following folders:

To run Art2Mus with the example digitized artwork you will need to run the test_art2mus.py script as it is. Alternatively, you can add your own digitized artwork to the art2mus_/test_images/ folder and use it to generate new music!

To generate music from your own digitized artwork, update the variable called EXAMPLE_ARTWORK_PATH variable in the test_art2mus.py script with the path to your digitized artwork, then run the script.

Train Art2Mus 🛠️

Download Data 💾


The Artwork-Music dataset we used consists of digitized artworks taken from the ArtGraph knowledge graph and music from the Large Free Music Archive (FMA) dataset.

ArtGraph's digitized artworks can be downloaded from Zenodo. Extract the .zip file contents, and place the imagesf2 folder in the data/images/ folder.

For the FMA dataset, you can easily download it using the script we provide. As specified in the script, you need a Kaggle account to download the FMA dataset. The dataset will be stored in the data/audio/ folder.

Additionally, you will need to download the digitized artworks and music embeddings. Click here to download them. After downloading, extract the safetensors (the .safetensors files) and place them in the art2mus_/data/extra/ folder.

In the end, your data folder should look like this:

plot

The fma_large folder will contain several subfolders (e.g., 001, 002, 003, etc.). The imagesf2 folder should contain all the digitized artworks available in ArtGraph. Finally, the extra folder should contain the digitized artworks and music embeddings, as well as a .json file that contains all the artwork-music pairs.


Run Training 🎨🎶


Important

The following subfolders are required for the training to work properly: art2mus_/src/art2mus/tmp_ground_truth and art2mus_/src/art2mus/tmp_generated. If you do not find them under the art2mus_/src/art2mus folder, you need to create them.

We provide both Art2Mus and Art2Mus-4 training codes to allow you to train your own version of Art2Mus. The training scripts can be found in the following folder: art2mus_/src/art2mus, and are named art2mus_train.py and art2mus_4_train.py.

You can either train the Art2Mus (or Art2Mus-4) image projection layer from scratch or fine-tune it using the weights provided in the following folder: art2mus_/art2mus_weights. If you want to train it from scratch, you need to move the weights associated with the model you want to train/tune out of art2mus_/art2mus_weights.

Note

Before launching the training script, we suggest carefully debugging the code to ensure you understand the overall training process. Additionally, we use Wandb to track our training, so you must setup it.

Below is an example of how you can launch the training/tuning using the accelerate library:

nohup accelerate launch src/art2mus/art2mus_train.py --num_epochs 20 --large_batch_size 8 --lr_warmup_steps 250  --dataloader_num_workers 16 --use_snr_gamma --set_wandb_offline

Details on each additional parameter that you can list after the training script can be found in the train_test_argparse.py script.

Change Log

  • 2024-08-18: Uploaded weights for Art2Mus and Art2Mus-4! 🌟
  • 2024-08-24: Uploaded training scripts for Art2Mus and Art2Mus-4! 🌟

TODO

  • Open-source Art2Mus's training code.
  • Improve the quality of the generated music.
  • Optimize the overall inference speed of Art2Mus.
  • Test the impact of image transformations on the final generated music.

Cite this work

If you found this repository useful, please consider citing:

@misc{rinaldi2024art2musbridgingvisualarts,
      title={Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation},
      author={Ivan Rinaldi and Nicola Fanelli and Giovanna Castellano and Gennaro Vessio},
      year={2024},
      eprint={2410.04906},
      archivePrefix={arXiv},
      primaryClass={cs.MM},
      url={https://arxiv.org/abs/2410.04906},
}

About

Art2Mus is a system that generates music based on digitized artworks and text by using the AudioLDM2 architecture with an added projection layer for integrating visual and textual inputs.

Topics

Resources

Stars

Watchers

Forks

Languages