Info about adding custom functions?

#14
by TeddyB - opened

Hi, first of all, I wanted to congratulate you guys on the research. Really impressive stuff!

I was wondering if you could provide me with some info about how I can start the process of adding my own functions to the vocabulary of the model.

Say I have 20 new functions I would like to teach the model. What would be the steps that you take to get this done?

Hi, as along as you have a well-defined task. You could formulate the function. You can label the data manually (costly as well), or you can use synthetic data. Please refer to our paper

I read through it and feel like I'm missing information about:

  1. How exactly was the vocabulary extended?
    From what I found online, there are multiple ways to extend the vocabulary. So I was wondering what exactly did you guys do?

  2. After extending the vocabulary, do the embedding and lm_head layers need to be retrained?
    I'm the paper it's mentioned that after extending the vocabulary, you go through a round of fine-tuning. But from my understanding, fine-tuning won't train the lm_head and embedding layers.
    So what was done to train the above 2 mentioned layers?

TeddyB changed discussion status to closed
TeddyB changed discussion status to open

Please stay tuned. We will open source code later. For earliest notification, consider to join our waitlist: https://www.nexa4ai.com/contact

Curious to check the opensource codebase out soon to know the details!

Nexa AI org

Hi @TeddyB

  1. We add functional tokens to vocabulary, see
    https://huggingface.co./NexaAIDev/Octopus-v2/blob/main/tokenizer_config.json
  2. We will prepare a training pipeline on AWS / Google cloud soon for customized API training requirements

Hi @zackli4ai ,

Thanks for the info, I see the new special tokens added to the tokenizer now

I have some follow-up questions:

  1. Have you tried your technique of adding new functional tokens to other base models, like MS Phi-3 Mini or Meta Llama 2 8b?
  2. Are you also planning on releasing the dataset you used to train the model?
Nexa AI org

@TeddyB

  1. Yes, Octopus-V4 is based on Phi-3 : https://huggingface.co./NexaAIDev/Octopus-v4
  2. We are building a training pipeline on AWS / Google Cloud
    thanks for questions

Sign up or log in to comment