Info about adding custom functions?
Hi, first of all, I wanted to congratulate you guys on the research. Really impressive stuff!
I was wondering if you could provide me with some info about how I can start the process of adding my own functions to the vocabulary of the model.
Say I have 20 new functions I would like to teach the model. What would be the steps that you take to get this done?
Hi, as along as you have a well-defined task. You could formulate the function. You can label the data manually (costly as well), or you can use synthetic data. Please refer to our paper
I read through it and feel like I'm missing information about:
How exactly was the vocabulary extended?
From what I found online, there are multiple ways to extend the vocabulary. So I was wondering what exactly did you guys do?After extending the vocabulary, do the embedding and lm_head layers need to be retrained?
I'm the paper it's mentioned that after extending the vocabulary, you go through a round of fine-tuning. But from my understanding, fine-tuning won't train the lm_head and embedding layers.
So what was done to train the above 2 mentioned layers?
Please stay tuned. We will open source code later. For earliest notification, consider to join our waitlist: https://www.nexa4ai.com/contact
Curious to check the opensource codebase out soon to know the details!
Hi @TeddyB
- We add functional tokens to vocabulary, see
https://huggingface.co./NexaAIDev/Octopus-v2/blob/main/tokenizer_config.json - We will prepare a training pipeline on AWS / Google cloud soon for customized API training requirements
Hi @zackli4ai ,
Thanks for the info, I see the new special tokens added to the tokenizer now
I have some follow-up questions:
- Have you tried your technique of adding new functional tokens to other base models, like MS Phi-3 Mini or Meta Llama 2 8b?
- Are you also planning on releasing the dataset you used to train the model?
- Yes, Octopus-V4 is based on Phi-3 : https://huggingface.co./NexaAIDev/Octopus-v4
- We are building a training pipeline on AWS / Google Cloud
thanks for questions