Maybe SFT for better Chat ability or RLHF or fuction call soon?

#22
by Yhyu13 - opened

Hi, this model has answered my tricky question correctly like no other 34B models can (they would assume 1010.A.D is a future time)

msedge_b4BrqkjJ9E.png

BUT, I do not like its output formatting, at least Yi-34B does not follow my "step by step" reasoning instructions.

I do like its tone, but still find it is not Human-Perferred like other RLHF-ed models

BTW, the testing env is
Latest textgen-webui
Latest exllamav2
TheBloke/Yi-34B-GPTQ

Here is an ouput from XWin-13B-v0.2 which was my faviorate RLHF-ed model.

Even though it has failed in finding out 1010.A.D is a past time, but I find its answer very structed, easy to read and easy to follow

Screenshot_2023-11-11_11-14-34.png

Yhyu13 changed discussion title from Maybe SFT for better Chat ability or RLHF soon? to Maybe SFT for better Chat ability or RLHF or fuction call soon?

Also looking forwarding seeing future progress for function calling abilities.

Dudes, it is just the most essential part for recent released models to catch up with GPTs

This is base model, so I am not sure why you are expecting it to have behavior expected from chat-instruct models. 01.ai team said that they are working on chat fine-tune, it might give that assistant-like vibe. Having base pre-trained models which are not RHLFed is essential to allow later customization like RHLF. Yi model architecture make it a GPT, OpenAI doesn't have monopoly on that word.

Yhyu13 changed discussion status to closed

@adamo1139

Now there are atucally some sft chat model outthere, this one is from https://huggingface.co./TheBloke/Nous-Capybara-34B-GGUF

with textgen-webui

MODEL=Nous-Capybara-Yi-34B-200K-GPTQ
python server.py --model $MODEL \
    --loader exllamav2 \
    --max_seq_len 8192 \

The Englisht ability is way way ahead of any open source model I have seen (forgive my ignorance!), it is so prudent and high-intellegent. Though there is a weired ending token </s> but probably is due by prompt template not fully supported

msedge_UiPxuQZ7an.png

Yhyu13 changed discussion status to open

I have the issue with < /s> token being printed at the end of the reply when running my own qlora intune, it's because the dataset is made for llama, where this is the default EOS token, but it's trained on Yi where EOS token is <|endoftext|>. I bet that's the same issue as with Nous Capybara. I haven't tried this fine tune yet. This model is a good base for fine-tuning.

it's because the dataset is made for llama, where this is the default EOS token, but it's trained on Yi where EOS token is <|endoftext|>. I bet that's the same issue as with Nous Capybara.

I also think this is the most likely reason.

FancyZhao changed discussion status to closed

Sign up or log in to comment