Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baal seems to ignore eval_batch_size causing gpu memory issues #280

Open
hugocool opened this issue Oct 20, 2023 · 6 comments
Open

Baal seems to ignore eval_batch_size causing gpu memory issues #280

hugocool opened this issue Oct 20, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@hugocool
Copy link

Describe the bug
When setting the batch_size to 2 in BAAL, it appears to be using a batch_size of 16 instead. This is causing a CUDA out of memory error. Despite setting per_device_eval_batch_size and train_batch_size to 2 in TrainingArguments, the predict_on_dataset function seems to be using a batch_size of 16, I am letting BAAL sort 1e6 (1 million) examples, when i run the predict_on_dataset function i see the following in the logs:

0%| | 0/62500 [00:00<?, ?it/s] 0%| | 0/62500 [00:01<?, ?it/s]

meaning it is using a batch_size of 16, instead of the specified 2.
A batch size of 8 would also work (if i manually downsample the input dataframe to be 8 inputs).

To Reproduce

    model = patch_module(model)
    from transformers import TrainingArguments

    args = TrainingArguments(output_dir="/", per_device_eval_batch_size=2)
    args = args.set_dataloader(
        train_batch_size=2, eval_batch_size=2, auto_find_batch_size=False
    )

    trainer = BaalTransformersTrainer(
        model=model,
        args=args,
    )

    dataset = Dataset.from_pandas(tokenized_X)
    predictions = trainer.predict_on_dataset(dataset, iterations=30)

which gives:

"CUDA out of memory. Tried to allocate 1.41 GiB (GPU 0; 15.78 GiB total capacity; 14.49 GiB already allocated; 397.75 MiB free; 14.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"

Expected behavior
The predict_on_dataset function should respect the batch_size specified in TrainingArguments and not cause a CUDA out of memory error.

Version (please complete the following information):

  • OS: ubuntu 22
  • Python 3.10.4:
  • Baal:
    version : 1.9.1
    description : Library to enable Bayesian active learning in your research or labeling work.

dependencies

  • h5py >=3.4.0,<4.0.0
  • matplotlib >=3.4.3,<4.0.0
  • numpy >=1.21.2,<2.0.0
  • Pillow >=6.2.0
  • scikit-learn >=1.0.0,<2.0.0
  • scipy >=1.7.1,<2.0.0
  • structlog >=21.1.0,<22.0.0
  • torch >=1.6.0
  • torchmetrics >=0.9.3,<0.10.0
  • tqdm >=4.62.2,<5.0.0

Additional context
I am running this on AWS batch on a p3 instance.

@hugocool hugocool added the bug Something isn't working label Oct 20, 2023
@Dref360
Copy link
Member

Dref360 commented Oct 20, 2023

Hello,

Thank you for submitting the issue. I should be able to take a look over the weekend.

I'm a bit puzzled because we simply call self.get_eval_dataloader(dataset) which is managed by HuggingFace. I'll know more this weekend.

@hugocool
Copy link
Author

hugocool commented Oct 20, 2023 via email

@Dref360
Copy link
Member

Dref360 commented Oct 21, 2023

I was thinking about it and maybe it's because of the stacking we perform.
For our HF implementation, we always perform MC-Dropout in a single pass meaning that batch_size=2 will result in a batch size of 2 * ITERATIONS to be fed to the model. You said that a manual batch_size of 8 is your maximum so 2*30=60 which is too much.

Our ModelWrapper implementation has a flag replicate_in_memory which avoid stacking, but we have it for HF.

It is fairly trivial to add this feature so I'll do that.

Progress bar stuff

I just tested the progress bar problem and it seems to work. 🤔

from datasets import load_dataset
from transformers import pipeline, TrainingArguments, DataCollatorWithPadding

from baal.transformers_trainer_wrapper import BaalTransformersTrainer

TEXT_COL = 'sentence'
ds = load_dataset('sst2')['test'].remove_columns('label')
pipe = pipeline('text-classification', model='distilbert-base-uncased-finetuned-sst-2-english')
tokenizer = pipe.tokenizer
model = pipe.model


def preprocess_function(examples):
    return tokenizer(examples[TEXT_COL], truncation=True)


tokenized_ds = ds.map(preprocess_function, batched=True)

training_args = TrainingArguments(
    output_dir='/tmp',
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

trainer = BaalTransformersTrainer(model=model, args=training_args, tokenizer=tokenizer,
                                  data_collator=data_collator, )
print("Total examples", len(tokenized_ds))
print(
    f"Dataloader length={len(trainer.get_eval_dataloader(tokenized_ds))}, batch_size={training_args.per_device_eval_batch_size}")
trainer.predict_on_dataset(tokenized_ds, iterations=10)

@Dref360
Copy link
Member

Dref360 commented Oct 21, 2023

I just opened #281 which should allow you to run your experiment.

If you can install Baal from source from this branch, you could update your code with:

trainer = BaalTransformersTrainer(
        model=model,
		replicate_in_memory=False,
        args=args,
    )

and that should fix it.

In any case, I should be able to get the PR merged this week and will release a minor version along with it :)

@hugocool
Copy link
Author

hugocool commented Oct 21, 2023

Im sorry for any miscommunication, what i meant by manually setting the batch_size to 8 is the following:

    predictions = np.empty((0, model.num_labels, iterations))

    for chunk in df_chunker(tokenized_X, batch_size=2):
        dataset = Dataset.from_pandas(chunk)
        _predictions: NDArray[
            (batch_size, model.num_labels, iterations), np.float32
        ] = trainer.predict_on_dataset(dataset, iterations=iterations)
        predictions = np.concatenate((predictions, _predictions), axis=0)

where

def df_chunker(
    df: pd.DataFrame, batch_size: int = 1000
) -> Generator[pd.DataFrame, None, None]:
    """
    Splits a pandas DataFrame into smaller chunks of a specified batch size.

    Args:
        df (pandas.DataFrame): The DataFrame to be split.
        batch_size (int): The number of rows in each chunk.

    Yields:
        pandas.DataFrame: A chunk of the original DataFrame with the specified number of rows.
    """
    for i in range(0, len(df), batch_size):
        yield df.iloc[i : i + batch_size]

So the iterations are still 30, my max batch_size is 8 so the number of inputs its loading into the model is 8*30.
Im basically forcing the predict function to only be able to take 8 inputs at a time.
However, when i dont force chunk the batch_size, it seems to be predicting in much larger batches, which cause memory overflows.
What is so weird about this bug, is that it might not even be BAAL related, it might just seem so because of the progress bar.

Anyway, ill install BAAL from #281 and see whether that removes the need for my forced chunking solution.
Thanks!

@parmidaatg
Copy link
Collaborator

hi @hugocool , Wanted to see if the above issue was resolved with the fix from #281?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants