Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couldn't Reproduce the Code #3

Open
monody1 opened this issue May 22, 2024 · 9 comments
Open

Couldn't Reproduce the Code #3

monody1 opened this issue May 22, 2024 · 9 comments

Comments

@monody1
Copy link

monody1 commented May 22, 2024

Hello,

I attempted to reproduce the code, but encountered some issues. Could you please provide some insights into how much memory is expected to be used? Additionally, I suspect there might be a memory leak.

dataset_name:SUN397
torch.cuda.memory_allocated:6.21 GB
0%|▏ | 1/503 [00:01<14:33, 1.74s/it]
dataset_name:Cars
torch.cuda.memory_allocated:11.91 GB
0%|▎ | 1/394 [00:01<08:39, 1.32s/it]
dataset_name:RESISC45
torch.cuda.memory_allocated:17.61 GB
1%|▋ | 1/169 [00:01<03:21, 1.20s/it]
dataset_name:EuroSAT
torch.cuda.memory_allocated:23.31 GB
Using downloaded and verified file: /home/monody/AdaMerging/dataset/svhn/train_32x32.mat
Using downloaded and verified file: /home/monody/AdaMerging/dataset/svhn/test_32x32.mat
0%| | 1/1627 [00:01<38:39, 1.43s/it]
dataset_name:SVHN
torch.cuda.memory_allocated:29.01 GB
0%| | 0/790 [00:00<?, ?it/s]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 0; 31.74 GiB total capacity; 29.65 GiB already allocated; 25.12 MiB free; 31.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Thank you for your assistance.

@EnnengYang
Copy link
Owner

Hi,

Thank you very much for your interest in our work.

Which architecture are you currently merging? ViT-B/32, ViT-B/16 or ViT-L/14? I remember when I experimented, the ViT-B/32 and ViT-B/16 could be executed on a single 3090 GPU (i.e., 24G).

If you just want to evaluate, you can directly load my trained merge coefficients, which can be found at merging_cofficient.py.

Best,
Enneng

@monody1
Copy link
Author

monody1 commented May 22, 2024

I have been trying the ViT-B/16 architecture on 8 datasets using V100s GPUs (32G) with the main_task_wise_adamerging method. However, I've observed some issues at data loading.
Additionally, regarding the loss calculation in an unsupervised setting with 8 datasets: is it correct to understand that the unsupervised loss accumulates across the 8 batches from each dataset, then performs a backward pass? Also, is the order of these 8 batches fixed during unsupervised training?

@EnnengYang
Copy link
Owner

Hi,

ViT-B/16 doesn't seem to take up much memory, as the checkpoint file for each dataset is only 426.55MB. Can you run with ViT-B/32?

Or is it because you adjusted the batch size? For training coefficients, I default to 16.

On each iteration/step, the unlabeled test set in the code is re-loaded (and the Shuffle function is used), so the batch data is not fixed for each iteration.

Best,
Enneng

@monody1
Copy link
Author

monody1 commented May 22, 2024

I am using the default batch size of 16, but the code crashes during the data loading phase. The issue seems to occur at

x = data['images'].to(args.device)
y = data['labels'].to(args.device)
outputs = adamerging_mtl_model(x, dataset_name)

image
is this proceed right?

I have summarized the CUDA memory usage here.

for epoch in range(epochs):
    losses = 0.
    for dataset_name in exam_datasets:
        dataset = get_dataset(dataset_name, pretrained_model.val_preprocess, location=args.data_location, batch_size=16)
        dataloader = get_dataloader_shuffle(dataset)
        data = next(iter(dataloader))
        data = maybe_dictionarize(data)
        x = data['images'].to(args.device)
        y = data['labels'].to(args.device)

        outputs = adamerging_mtl_model(x, dataset_name)
        loss = softmax_entropy(outputs).mean(0)
        losses += loss

        print(dataset_name)
        print(f'{(torch.cuda.memory_allocated()/1024/1024/1024):.2f} GB')   #collect mem usage
    optimizer.zero_grad()
    losses.backward()
    optimizer.step()
Details

`python main_task_wise_adamerging.py TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/SUN397/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/Cars/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/RESISC45/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/EuroSAT/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/SVHN/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/GTSRB/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/MNIST/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/DTD/finetuned.pt Classification head for ViT-B-16 on SUN397 exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_SUN397.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_SUN397.pt Classification head for ViT-B-16 on Cars exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_Cars.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_Cars.pt Classification head for ViT-B-16 on RESISC45 exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_RESISC45.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_RESISC45.pt Classification head for ViT-B-16 on EuroSAT exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_EuroSAT.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_EuroSAT.pt Classification head for ViT-B-16 on SVHN exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_SVHN.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_SVHN.pt Classification head for ViT-B-16 on GTSRB exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_GTSRB.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_GTSRB.pt Classification head for ViT-B-16 on MNIST exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_MNIST.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_MNIST.pt Classification head for ViT-B-16 on DTD exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_DTD.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_DTD.pt init lambda: tensor([[1.0000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000]], grad_fn=) collect_trainable_params: [Parameter containing: tensor([[0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000]], requires_grad=True)] 0%| | 1/1243 [00:02<58:34, 2.83s/it] SUN397 6.21 GB 0%|▏ | 1/503 [00:01<12:01, 1.44s/it] Cars 11.91 GB 0%|▎ | 1/394 [00:01<08:50, 1.35s/it] RESISC45 17.61 GB 1%|▋ | 1/169 [00:01<03:55, 1.40s/it] EuroSAT 23.31 GB Using downloaded and verified file: /home/monody/AdaMerging/dataset/svhn/train_32x32.mat Using downloaded and verified file: /home/monody/AdaMerging/dataset/svhn/test_32x32.mat 0%| | 1/1627 [00:01<37:30, 1.38s/it] Traceback (most recent call last): ..... torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 31.74 GiB total capacity; 28.84 GiB already allocated; 11.12 MiB free; 30.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF`

@monody1
Copy link
Author

monody1 commented May 22, 2024

It can run by making this change, but the memory usage is still inefficient.

for epoch in  #range(epochs):
losses = 0.
    for dataset_name in exam_datasets:
        dataset = get_dataset(dataset_name, pretrained_model.val_preprocess, location=args.data_location, batch_size=16)
        dataloader = get_dataloader_shuffle(dataset)
        data = next(iter(dataloader))  #get one batch
        data = maybe_dictionarize(data)
        x = data['images'].to(args.device)
        y = data['labels'].to(args.device)
        outputs = adamerging_mtl_model(x, dataset_name)
        loss = softmax_entropy(outputs).mean(0)
        losses += loss
        print(dataset_name)
        print(f'{(torch.cuda.memory_allocated()/1024/1024/1024):.2f} GB')
    optimizer.zero_grad()
    losses.backward()
    optimizer.step()

image

@EnnengYang
Copy link
Owner

I am using the default batch size of 16, but the code crashes during the data loading phase. The issue seems to occur at

x = data['images'].to(args.device)
y = data['labels'].to(args.device)
outputs = adamerging_mtl_model(x, dataset_name)

image is this proceed right?

I have summarized the CUDA memory usage here.

for epoch in range(epochs):
    losses = 0.
    for dataset_name in exam_datasets:
        dataset = get_dataset(dataset_name, pretrained_model.val_preprocess, location=args.data_location, batch_size=16)
        dataloader = get_dataloader_shuffle(dataset)
        data = next(iter(dataloader))
        data = maybe_dictionarize(data)
        x = data['images'].to(args.device)
        y = data['labels'].to(args.device)

        outputs = adamerging_mtl_model(x, dataset_name)
        loss = softmax_entropy(outputs).mean(0)
        losses += loss

        print(dataset_name)
        print(f'{(torch.cuda.memory_allocated()/1024/1024/1024):.2f} GB')   #collect mem usage
    optimizer.zero_grad()
    losses.backward()
    optimizer.step()

Details

Hi,

Summing the losses across multiple datasets is correct when doing backpropagation and updating the parameters.

@EnnengYang
Copy link
Owner

It can run by making this change, but the memory usage is still inefficient.

for epoch in  #range(epochs):
losses = 0.
    for dataset_name in exam_datasets:
        dataset = get_dataset(dataset_name, pretrained_model.val_preprocess, location=args.data_location, batch_size=16)
        dataloader = get_dataloader_shuffle(dataset)
        data = next(iter(dataloader))  #get one batch
        data = maybe_dictionarize(data)
        x = data['images'].to(args.device)
        y = data['labels'].to(args.device)
        outputs = adamerging_mtl_model(x, dataset_name)
        loss = softmax_entropy(outputs).mean(0)
        losses += loss
        print(dataset_name)
        print(f'{(torch.cuda.memory_allocated()/1024/1024/1024):.2f} GB')
    optimizer.zero_grad()
    losses.backward()
    optimizer.step()

image

It is true that reloading the dataset per iteration is not efficient. But due to RAM limitations, I can't keep all the dataloaders in memory, so I have to read them separately each iteration.

A simple modifiable solution would be to remove all the training dataloaders, since we won't be using the training set for our project, and only access the test set.

@monody1
Copy link
Author

monody1 commented Jun 3, 2024

你好
我还是没有复现出ViT-B/16的效果, 每次进入这个循环

for dataset_name in exam_datasets:

dataloader重新初始化 然后获得一个batch 对吗? 是不是等价于每个epoch都是从8个数据集里重新采样的,也就是在epoch_i 中 batch from dataset_A 和 epoch_j 中 batch from dataset_A 中的样本是可以重复的对吗? 目前λ在500 epoch下[[1.0000, 0.1601, 0.0774, 0.0510, 0.0546, 0.0422, 0.0992, 0.0571, 0.7357]] 还没有接近在merging_cofficient.py 给出的值 [[1.0000, 0.1916, 0.1585, 0.2502, 0.3093, 0.2544, 0.3543, 0.2172, 0.1538]] 而且在8个数据集的avg acc (Eval: Epoch: 499 Avg ACC:0.6502134075542055) 是下降趋势的 能给出更多的细节吗?

谢谢

@EnnengYang
Copy link
Owner

您好,

for epoch in range(epochs):\\ for dataset_name in exam_datasets: \\ dataset = get_dataset(dataset_name, pretrained_model.val_preprocess, location=args.data_location, batch_size=16) \\ dataloader = get_dataloader_shuffle(dataset)

即具体实现中,每次进入循环时都重新获取data_loader(其中data_loader中加载时会对数据shuffle),例如datasets/mnist.py中shuffled dataloader为:

self.test_loader_shuffle = torch.utils.data.DataLoader( self.test_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers )

也就是说,每个迭代是从该数据集对应的test datasets里随机采样一个batch,由于随机采样,那么多个迭代里采样到的数据可能会出现少量重叠,但不会完全重叠。

总之,AdaMerging中合并参数优化时,会从每个数据集中随机抽一个Batch的数据出来计算Loss,然后根据加和的Loss计算梯度并更新合并系数。

祝好

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants