-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How we can use Baal for NER using huggingface #262
Comments
Hello, I'm not super familiar with NER, but Baal should be able to handle it easily. Here is a script that load a NER model and outputs prediction. We use MC-Dropout to get the predictions and BALD to estimate the uncertainty. Note that to get the uncertainty, we need to swap the axes to have the probabilities on the axis 1. from datasets import load_dataset
from transformers import pipeline
from baal.active.heuristics import BALD
from baal.bayesian.dropout import patch_module
from baal.transformers_trainer_wrapper import BaalTransformersTrainer
dataset = load_dataset("conll2003")
pipeline = pipeline('ner', model='issifuamajeed/distilbert-base-uncased-finetuned-ner')
tokenizer = pipeline.tokenizer
# Apply MC-Dropout
model = patch_module(pipeline.model)
trainer = BaalTransformersTrainer(model=model)
def preprocess(example):
results = tokenizer(" ".join(example['tokens']), max_length=50,
truncation=True, padding='max_length')
return results
tokenized_dataset = dataset.map(preprocess)
# Shape [Batch_size, Num-Tokens, Probabilities, Iterations]
predictions = trainer.predict_on_dataset(tokenized_dataset['test'], iterations=10)
# Predictions with Class first [batch_size, Probabilities, Num Tokens, Iteration]
next_to_label = BALD(reduction='sum')(predictions.swapaxes(1, 2))
uncertainties = BALD().get_uncertainties(predictions.swapaxes(1, 2)) I hope this helps! |
hi @Dref360, So next_to_label contain a list of indexes, So, these are in sorted order i.e, if list contain [4,5,1,2] means 4 have highest uncertainty? How we can use this in ActiveLearningLoop? |
Hello,
I had to do some modifications to Baal to make ActiveLearningLoop work on NER. I opened #263 to fix that. I edited my code to showcase a complete active learning experiment on NER. |
Hii @Dref360
If we add this Piece of code does it means we |
Oh I made a mistake in the Gist, now updated. We first load the initial weights before training. for _ in range(2):
trainer.load_state_dict(init_weights)
print(f"Active learning: labelled={al_dataset.n_labelled} unlabelled={al_dataset.n_unlabelled}")
trainer.train()
trainer.lr_scheduler = None
trainer.evaluate()
loop.step() For your question: The first step will train on 100 and label 100 more. Then we will train on 200, label 100, etc. |
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: