🔒 On-device inference: no data sent to a server
⚡️ WebGPU-accelerated (> 20 t/s)
📥 Model downloaded once and cached
Try it out: Xenova/experimental-phi3-webgpu
Join the community of Machine Learners and AI enthusiasts.
Sign UpThis is so cool!
how do u obtain the wasm file? Didn't find it here: https://cdn.jsdelivr.net/npm/@xenova/[email protected]/dist/
cc: @Xenova
This is really cool! Performance is really good. I am running this on Chrome and Chrome unstable on Arch Linux with a RTX 3050 with 4GB ram on a Dell XPS 17. Unfortunately, inference starts super fast, but after a few sentences, I get what looks like a vulkan memory error:
vkAllocateMemory failed with VK_ERROR_OUT_OF_DEVICE_MEMORY
From that point on, the streaming only returns garbage. Will investigate further and see if I can get this running without crashing.