torch.distributed.launch on eight 40G A100, CUDA out of memory. #26

zhengbiqing · 2023-09-14T09:23:54Z

I run:
export CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7'
task=gene
datadir=data/$task
outdir=runs/$task/GPT2
name=gene0913
checkpoint=/root/siton-glusterfs-eaxtsxdfs/xts/data/BioMedLM
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --use_env run_seqcls_gpt.py
--tokenizer_name $checkpoint --model_name_or_path $checkpoint --train_file
$datadir/train.json --validation_file $datadir/dev.json --test_file $datadir/test.json --do_train
--do_eval --do_predict --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1
--learning_rate 2e-6 --warmup_ratio 0.5 --num_train_epochs 5 --max_seq_length
32 --logging_steps 1 --save_strategy no --evaluation_strategy no --output_dir
$outdir --overwrite_output_dir --bf16 --seed 1000 --run_name %name

but still get CUDA out of memory.
Anyone know to finetune seqcls how many GPUs must be need?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.distributed.launch on eight 40G A100, CUDA out of memory. #26

torch.distributed.launch on eight 40G A100, CUDA out of memory. #26

zhengbiqing commented Sep 14, 2023

torch.distributed.launch on eight 40G A100, CUDA out of memory. #26

torch.distributed.launch on eight 40G A100, CUDA out of memory. #26

Comments

zhengbiqing commented Sep 14, 2023