Skip to content

Commit

Permalink
Update FAQs to indicate wrapping not supported (#822)
Browse files Browse the repository at this point in the history
Co-authored-by: Saaketh Narayan <[email protected]>
  • Loading branch information
milocress and snarayan21 authored Nov 4, 2024
1 parent fd9b55a commit a3edf7d
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/source/getting_started/faqs_and_tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ The `epoch_size` attribute of StreamingDataset is the number of samples per epoc
### What's the difference between `StreamingDataset` vs. datasets vs. streams?
`StreamingDataset` is the dataset class. It can take in multiple streams, which are just data sources. It combines these streams into a single dataset. `StreamingDataset` does not *stream* data, as continuous bytes; instead, it downloads shard files to enable a continuous flow of samples into the training job. `StreamingDataset` is an `IterableDataset` as opposed to a map-style dataset -- samples are retrieved as needed.

### Should I wrap the Streaming DataLoader with HuggingFace Accelerate's DataLoader wrapper when training?
When using HF Accelerate with Streaming for training, do not wrap the DataLoader as this can may cause hangs during training. StreamingDataset is ready for distributed training out of the box and does not need the wrapping that HF Accelerate provides.

## 🤓 Helpful Tips

Expand Down

0 comments on commit a3edf7d

Please sign in to comment.