用8张4090可以运行14B的LoRA SFT吗？ #757

hunter-xue · 2023-12-07T10:51:26Z

hunter-xue
Dec 7, 2023

报错如下，看上去是OOM错误：

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 134.00 MiB. GPU 1 has a total capacty of 23.65 GiB of which 31.62 MiB is free. Process 2136596 has 3.90 GiB memory in use. Process 2136599 has 2.60 GiB memory in use. Process 2136598 has 2.34 GiB memory in use. Process 2136597 has 3.32 GiB memory in use. Process 2136595 has 3.90 GiB memory in use. Process 2136600 has 2.34 GiB memory in use. Process 2136602 has 2.60 GiB memory in use. Including non-PyTorch memory, this process has 2.60 GiB memory in use. Of the allocated memory 2.22 GiB is allocated by PyTorch, and 1.80 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.

是否可以通过张量并行或者类似的什么方案来解决这个问题？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

用8张4090可以运行14B的LoRA SFT吗？ #757

{{title}}

Replies: 0 comments

Select a reply

用8张4090可以运行14B的LoRA SFT吗？ #757

hunter-xue Dec 7, 2023

Replies: 0 comments

hunter-xue
Dec 7, 2023