-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Couldn't Reproduce the Code #3
Comments
Hi, Thank you very much for your interest in our work. Which architecture are you currently merging? ViT-B/32, ViT-B/16 or ViT-L/14? I remember when I experimented, the ViT-B/32 and ViT-B/16 could be executed on a single 3090 GPU (i.e., 24G). If you just want to evaluate, you can directly load my trained merge coefficients, which can be found at merging_cofficient.py. Best, |
I have been trying the ViT-B/16 architecture on 8 datasets using V100s GPUs (32G) with the main_task_wise_adamerging method. However, I've observed some issues at data loading. |
Hi, ViT-B/16 doesn't seem to take up much memory, as the checkpoint file for each dataset is only 426.55MB. Can you run with ViT-B/32? Or is it because you adjusted the batch size? For training coefficients, I default to 16. On each iteration/step, the unlabeled test set in the code is re-loaded (and the Shuffle function is used), so the batch data is not fixed for each iteration. Best, |
I am using the default batch size of 16, but the code crashes during the data loading phase. The issue seems to occur at x = data['images'].to(args.device)
y = data['labels'].to(args.device)
outputs = adamerging_mtl_model(x, dataset_name) I have summarized the CUDA memory usage here. for epoch in range(epochs):
losses = 0.
for dataset_name in exam_datasets:
dataset = get_dataset(dataset_name, pretrained_model.val_preprocess, location=args.data_location, batch_size=16)
dataloader = get_dataloader_shuffle(dataset)
data = next(iter(dataloader))
data = maybe_dictionarize(data)
x = data['images'].to(args.device)
y = data['labels'].to(args.device)
outputs = adamerging_mtl_model(x, dataset_name)
loss = softmax_entropy(outputs).mean(0)
losses += loss
print(dataset_name)
print(f'{(torch.cuda.memory_allocated()/1024/1024/1024):.2f} GB') #collect mem usage
optimizer.zero_grad()
losses.backward()
optimizer.step() Details`python main_task_wise_adamerging.py TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/SUN397/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/Cars/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/RESISC45/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/EuroSAT/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/SVHN/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/GTSRB/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/MNIST/finetuned.pt TaskVector:/home/monody/AdaMerging/checkpoints/ViT-B-16/DTD/finetuned.pt Classification head for ViT-B-16 on SUN397 exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_SUN397.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_SUN397.pt Classification head for ViT-B-16 on Cars exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_Cars.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_Cars.pt Classification head for ViT-B-16 on RESISC45 exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_RESISC45.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_RESISC45.pt Classification head for ViT-B-16 on EuroSAT exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_EuroSAT.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_EuroSAT.pt Classification head for ViT-B-16 on SVHN exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_SVHN.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_SVHN.pt Classification head for ViT-B-16 on GTSRB exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_GTSRB.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_GTSRB.pt Classification head for ViT-B-16 on MNIST exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_MNIST.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_MNIST.pt Classification head for ViT-B-16 on DTD exists at /home/monody/AdaMerging/checkpoints/ViT-B-16/head_DTD.pt Loading classification head from /home/monody/AdaMerging/checkpoints/ViT-B-16/head_DTD.pt init lambda: tensor([[1.0000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000]], grad_fn=) collect_trainable_params: [Parameter containing: tensor([[0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000]], requires_grad=True)] 0%| | 1/1243 [00:02<58:34, 2.83s/it] SUN397 6.21 GB 0%|▏ | 1/503 [00:01<12:01, 1.44s/it] Cars 11.91 GB 0%|▎ | 1/394 [00:01<08:50, 1.35s/it] RESISC45 17.61 GB 1%|▋ | 1/169 [00:01<03:55, 1.40s/it] EuroSAT 23.31 GB Using downloaded and verified file: /home/monody/AdaMerging/dataset/svhn/train_32x32.mat Using downloaded and verified file: /home/monody/AdaMerging/dataset/svhn/test_32x32.mat 0%| | 1/1627 [00:01<37:30, 1.38s/it] Traceback (most recent call last): ..... torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 31.74 GiB total capacity; 28.84 GiB already allocated; 11.12 MiB free; 30.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF` |
It can run by making this change, but the memory usage is still inefficient. for epoch in #range(epochs):
losses = 0.
for dataset_name in exam_datasets:
dataset = get_dataset(dataset_name, pretrained_model.val_preprocess, location=args.data_location, batch_size=16)
dataloader = get_dataloader_shuffle(dataset)
data = next(iter(dataloader)) #get one batch
data = maybe_dictionarize(data)
x = data['images'].to(args.device)
y = data['labels'].to(args.device)
outputs = adamerging_mtl_model(x, dataset_name)
loss = softmax_entropy(outputs).mean(0)
losses += loss
print(dataset_name)
print(f'{(torch.cuda.memory_allocated()/1024/1024/1024):.2f} GB')
optimizer.zero_grad()
losses.backward()
optimizer.step() |
Hi, Summing the losses across multiple datasets is correct when doing backpropagation and updating the parameters. |
It is true that reloading the dataset per iteration is not efficient. But due to RAM limitations, I can't keep all the dataloaders in memory, so I have to read them separately each iteration. A simple modifiable solution would be to remove all the training dataloaders, since we won't be using the training set for our project, and only access the test set. |
你好 for dataset_name in exam_datasets: dataloader重新初始化 然后获得一个batch 对吗? 是不是等价于每个epoch都是从8个数据集里重新采样的,也就是在epoch_i 中 batch from dataset_A 和 epoch_j 中 batch from dataset_A 中的样本是可以重复的对吗? 目前λ在500 epoch下[[1.0000, 0.1601, 0.0774, 0.0510, 0.0546, 0.0422, 0.0992, 0.0571, 0.7357]] 还没有接近在merging_cofficient.py 给出的值 [[1.0000, 0.1916, 0.1585, 0.2502, 0.3093, 0.2544, 0.3543, 0.2172, 0.1538]] 而且在8个数据集的avg acc (Eval: Epoch: 499 Avg ACC:0.6502134075542055) 是下降趋势的 能给出更多的细节吗? 谢谢 |
您好,
即具体实现中,每次进入循环时都重新获取data_loader(其中data_loader中加载时会对数据shuffle),例如datasets/mnist.py中shuffled dataloader为:
也就是说,每个迭代是从该数据集对应的test datasets里随机采样一个batch,由于随机采样,那么多个迭代里采样到的数据可能会出现少量重叠,但不会完全重叠。 总之,AdaMerging中合并参数优化时,会从每个数据集中随机抽一个Batch的数据出来计算Loss,然后根据加和的Loss计算梯度并更新合并系数。 祝好 |
Hello,
I attempted to reproduce the code, but encountered some issues. Could you please provide some insights into how much memory is expected to be used? Additionally, I suspect there might be a memory leak.
Thank you for your assistance.
The text was updated successfully, but these errors were encountered: