diff --git a/docs/source/getting_started/faqs_and_tips.md b/docs/source/getting_started/faqs_and_tips.md index bea912cf8..b9ae058fb 100644 --- a/docs/source/getting_started/faqs_and_tips.md +++ b/docs/source/getting_started/faqs_and_tips.md @@ -58,6 +58,8 @@ The `epoch_size` attribute of StreamingDataset is the number of samples per epoc ### What's the difference between `StreamingDataset` vs. datasets vs. streams? `StreamingDataset` is the dataset class. It can take in multiple streams, which are just data sources. It combines these streams into a single dataset. `StreamingDataset` does not *stream* data, as continuous bytes; instead, it downloads shard files to enable a continuous flow of samples into the training job. `StreamingDataset` is an `IterableDataset` as opposed to a map-style dataset -- samples are retrieved as needed. +### Should I wrap the Streaming DataLoader with HuggingFace Accelerate's DataLoader wrapper when training? +When using HF Accelerate with Streaming for training, do not wrap the DataLoader as this can may cause hangs during training. StreamingDataset is ready for distributed training out of the box and does not need the wrapping that HF Accelerate provides. ## 🤓 Helpful Tips