How did you fix the issues within the tutorial?

by Skrunbger - opened 11 days ago

11 days ago

I followed the exact same tutorial. I have semi-succesfully fixed them, but I was wondering how you got around them?
E.g:
Training raising errors due to token not existing at all instead of setting to None
pytorch_model.bin not appearing, so calling torch.save(model) manually <-- I'm not even sure if this is a proper fix, since my model starts to mentally implode after 4 consecutive messages each time.

(I set the 'messages' parameter to a circular queue. Anything more than 4 will cause it to spam '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!')

pineappleSoup

Owner 10 days ago

Can you tell me which part are you at? I assume you have finished training and starting to use the model or is it before that?

Skrunbger

9 days ago

•

edited 9 days ago

I've managed to follow through the entire tutorial.

Again, here are some corrections I made:
In both collate functions, I commented out the if statement, and directly returned pad_sequence w/ the padding_value
(for me, when training both the small/medium model, tokenizer._pad_token doesn't exist at all)

def collate(examples: List[torch.Tensor]):
# if tokenizer._pad_token is None:
        return pad_sequence(examples, batch_first=True)
# return pad_sequence(examples, batch_first=True, padding_value=tokenizer.pad_token_id)

After training, the pytorch_model.bin didn't appear in my drive.

I fixed this by directly calling
torch.save(model, os.path.join(args.output_dir, "pytorch_model.bin"))
in the main function (after settings args to skip training)

After doing that, my bot functions fine-ish. If I try to use the endpoint on this website, the bot replies okay for about 4 replies, then starts spamming exclamation marks after that.
This happens every time.

I used the InterfaceClient instead of whatever the tutorial provided. I then used a circular queue to store message history.

response = self.api_client.chat.completions.create(
            model="...", 
            messages=list(self.message_history), 
            max_tokens=500,
            temperature=0.8,
            stream=False
        )

If I set the maxlen of the circular queue to more than 4, the bot outputs something like this:

Skrunbger

9 days ago

•

edited 9 days ago

I'm asking because I'm unsure whether it's because of my weird hack of directly calling torch.save or it's just I the training data I gave it was bad.

I extracted speech from an ao3 fanfic html file.
I replaced all narration with "CONTEXT: blah blah"
I replaced all unknown speakers with "UNKNOWN: blah blah"
and if I knew what characters were speaking, I do "HANA: blah blah"

Correction:

In the final dataset that I used, I removed all "CONTEXT" lines and all that remains are characters speaking with each other.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment