How did you fix the issues within the tutorial?

#3
by Skrunbger - opened

I followed the exact same tutorial. I have semi-succesfully fixed them, but I was wondering how you got around them?
E.g:
Training raising errors due to token not existing at all instead of setting to None
pytorch_model.bin not appearing, so calling torch.save(model) manually <-- I'm not even sure if this is a proper fix, since my model starts to mentally implode after 4 consecutive messages each time.

(I set the 'messages' parameter to a circular queue. Anything more than 4 will cause it to spam '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!')

Can you tell me which part are you at? I assume you have finished training and starting to use the model or is it before that?

I've managed to follow through the entire tutorial.

Again, here are some corrections I made:
In both collate functions, I commented out the if statement, and directly returned pad_sequence w/ the padding_value
(for me, when training both the small/medium model, tokenizer._pad_token doesn't exist at all)

def collate(examples: List[torch.Tensor]):
# if tokenizer._pad_token is None:
        return pad_sequence(examples, batch_first=True)
# return pad_sequence(examples, batch_first=True, padding_value=tokenizer.pad_token_id)

After training, the pytorch_model.bin didn't appear in my drive.

I fixed this by directly calling
torch.save(model, os.path.join(args.output_dir, "pytorch_model.bin"))
in the main function (after settings args to skip training)

After doing that, my bot functions fine-ish. If I try to use the endpoint on this website, the bot replies okay for about 4 replies, then starts spamming exclamation marks after that.
This happens every time.

I used the InterfaceClient instead of whatever the tutorial provided. I then used a circular queue to store message history.

response = self.api_client.chat.completions.create(
            model="...", 
            messages=list(self.message_history), 
            max_tokens=500,
            temperature=0.8,
            stream=False
        )

If I set the maxlen of the circular queue to more than 4, the bot outputs something like this:
image.png

I'm asking because I'm unsure whether it's because of my weird hack of directly calling torch.save or it's just I the training data I gave it was bad.

I extracted speech from an ao3 fanfic html file.
I replaced all narration with "CONTEXT: blah blah"
I replaced all unknown speakers with "UNKNOWN: blah blah"
and if I knew what characters were speaking, I do "HANA: blah blah"

Correction:

In the final dataset that I used, I removed all "CONTEXT" lines and all that remains are characters speaking with each other.

Sign up or log in to comment