Guide for using CoreML models?

#20
by kerls - opened

Are there any templates or how-to guides for making Swift iOS apps using the exported LLM models?

I found https://github.com/huggingface/swift-coreml-transformers but it's not as easy as just swapping out the models because the input and output interfaces are different:

sample GPT2 CoreML model inside the repo:
Screenshot 2023-07-17 at 4.35.19 PM.png

The model that is output from this transformers-to-coreml space:
Screenshot 2023-07-17 at 4.35.32 PM.png

I'm upvoting this discussion because i indeed think a guide on how to make ML-powered iOS apps would be 🔥🔥🔥

maybe it can be community-led

cc @osanseviero @pcuenq with whom we've discussed in the past (and cc @Matthijs too!)

Core ML Projects org

I fully agree! There are many different pieces involved, I'm working towards publishing something soon to get things started and grow from there.

In this particular case, @kerls , you are right: sometimes you need the logits depending on what you are going to do with the model next. exporters and this tool try to do what's most useful most of the time, but we need to expose more options (coming soon).

In general, these are broadly the steps involved:

  • Conversion to Core ML. Also asked here and here. We aim for this Space to work in most cases, but newly-released models usually require some manual tweaking. I'll be publishing a guide for Falcon and Llama 2 in a few days.
  • Input preparation. This is usually referred to as "tokenization" in the transformers codebase, but there are many steps involved: normalization, pre-processing, post-processing, and the tokenization itself. You can check the nuances involved in the tokenizers repo.
  • Output preparation: go back from logits to tokens, and eventually text.
  • Generation algorithms: greedy search (use the most probable token, as predicted by the model, to continue the sequence), top-k search (sample from a few of the most probable tokens), beam search, etc. This is a great overview: https://huggingface.co./blog/how-to-generate
  • Optimization. Make your models run fast on your target hardware. Evaluation techniques are really important to understand the trade-offs between model size, speed and quality to meet your requirements. And it's not easy to evaluate or even compare language models.

This is an area of great interest to us, we've done lots of internal testing and it's time to start publishing artifacts. We can't wait to work together with the community on these topics!

Gotcha thanks! I managed to figure out how to make the exported TinyStories-1M CoreML model work for me.

My tweaked version of the sample iOS app is here in case it's helpful to anyone else, alongside my notes / guide from my (limited) understanding. My fork shouuld work with any text generation float32 model exported from this Space that also happens to be compatible with GPT2 tokenizers.

Looking forward to learning more about this as you publish artifacts :)

Core ML Projects org

@kerls We just published https://huggingface.co./blog/swift-coreml-llm and some new tools to generalize the process you followed :) Let us know if that helps!

Awesome! Great to see that we can now use transformers library in Swift 🔥 and also the call outs re: generation methods (the fact that greedy will give same output every time).

Sign up or log in to comment