Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.distributed.launch on eight 40G A100, CUDA out of memory. #26

Open
zhengbiqing opened this issue Sep 14, 2023 · 0 comments
Open

Comments

@zhengbiqing
Copy link

I run:
export CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7'
task=gene
datadir=data/$task
outdir=runs/$task/GPT2
name=gene0913
checkpoint=/root/siton-glusterfs-eaxtsxdfs/xts/data/BioMedLM
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --use_env run_seqcls_gpt.py
--tokenizer_name $checkpoint --model_name_or_path $checkpoint --train_file
$datadir/train.json --validation_file $datadir/dev.json --test_file $datadir/test.json --do_train
--do_eval --do_predict --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1
--learning_rate 2e-6 --warmup_ratio 0.5 --num_train_epochs 5 --max_seq_length
32 --logging_steps 1 --save_strategy no --evaluation_strategy no --output_dir
$outdir --overwrite_output_dir --bf16 --seed 1000 --run_name %name

but still get CUDA out of memory.
Anyone know to finetune seqcls how many GPUs must be need?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant