How to I fine tune this model?

#12
by ethanjyx - opened

Hey there, I am interested in finetuning this bert-base-uncased, how can I do it?

I found this tutorial https://huggingface.co./docs/transformers/training, but it focuses on finetuning a prediction head rather than the backbone weights.

I would like to

  1. finetune the backbone weights here, by dumping large corpus of texts from my domain,
  2. train a prediction head with a more limited dataset from my domain
    is that possible?

Hey Ethan! Would love to chat on this if you have a few minutes to spare. Please let me know :)

Hey Ethan! Would love to chat on this if you have a few minutes to spare. Please let me know :)

@nikharanirghin ? what do you want to chat about?

Feedback on finetuning bert!

can you help me too ? I want to fine-tuning this model too

This comment has been hidden

You can simply train the complete model with a very low learning rate to fine-tune the entire model.
When you load the pretrained model and set model.train() it will, by default, have all the layers enabled for back propagation.
IE:

def single_training_epoch(model, optimizer, train_dataloader):
    model.train()
    # Loop over the training set
    for input_ids, attention_masks, labels in train_dataloader:
        # Clear the gradients
        optimizer.zero_grad()
        # Forward pass
        outputs = model(input_ids, attention_mask=attention_masks, labels=labels)
        loss = outputs[0]
        # Backward pass
        loss.backward()
        optimizer.step()
    return model, optimizer

You can also manually set each layer (True = updates/trains, False = do not update during back-prop) using a loop:

for param in model.bert.parameters():
    param.requires_grad = True

My recommendation from there is to save the model out and then train a new model taking the output from BERT as an input.
This enables a few things:

  • WAY faster training/retraining of your 'prediction head' model as you can run the data through the previous model a single time and then train your smaller model.
  • Easier to retrain and experiment with different architectures of your 'prediction head' without even interacting with the BERT model.
  • Able to add additional values into your model (such as ints and floats that could be present in other data fields - or your features you've created yourself)

The one drawback is slightly slower inference time... but this can be mitigated by creating a proper pipeline (or a more advanced method would be to load them separately with their weights and merge the models together).

May I know What is the shape of the model
When train the I got this error
Target size (torch.Size([8, 6])) must be the same as input size (torch.Size([8, 2]))

I want to adjust the input shape with expected shape

Sign up or log in to comment