Performance problem with Sentis Whisper Tiny

#3
by deleted - opened

Hi, according to the post here, it's said that the main thread should only blocks for the job/graphics scheduling.
https://discussions.unity.com/t/how-to-use-multi-threading/307154

I use the Sentis Whisper Tiny demo to test on Android. Here's the time of each part.
It takes 23s for padding. It's horrible. Is it run in a sub thread?
And the other parts costs tens to hundreds of milliseconds.
If the tensor is never makeReadable and execute the next model again, the data is not read back. Right?
How can I make it not blocking the main thread?
Thank you.

before new TensorFloat(new TensorShape(1, numSamples), data)
TensorFloat time: 00:00:00.0108040
before ops.Pad(input, new int[] { 0, 0, 0, maxSamples - numSamples })
Pad time: 00:00:23.5512980
before spectroEngine.Execute(input30seconds)
Spectro time: 00:00:00.0336710
before encoderEngine.Execute(spectroOutput)
Encoder time: 00:00:00.4862310
before decoderEngine.Execute(inputs)
Decoder time: 00:00:00.8611970
before decoderEngine.Execute(inputs)
Decoder time: 00:00:00.0252620
before decoderEngine.Execute(inputs)
Decoder time: 00:00:00.0634260
before decoderEngine.Execute(inputs)
Decoder time: 00:00:00.0153540
before decoderEngine.Execute(inputs)
Decoder time: 00:00:00.0302340
before decoderEngine.Execute(inputs)
Decoder time: 00:00:00.0139590
和命中有什么

That's surprising. The padding shouldn't be taking that long! Have you tried updating to Sentis 1.4.0. The new code is on Hugging Face now. You should make a backup of your project in case you want to go back.

deleted

That's surprising. The padding shouldn't be taking that long! Have you tried updating to Sentis 1.4.0. The new code is on Hugging Face now. You should make a backup of your project in case you want to go back.

I didn't try 1.4.0. It seems that the 1.4.0 is only compatible with 2023. But my project can't be upated to 2023 now. Can I use 1.4.0 with 2022.3?

Hi you can get 1.4.0 using the package manager if you get it by name and put in 1.4.0-pre.3 for the version name. It should still work with 2022 though we recommend 2023.
You can try making a copy of your project and updating to 1.4.0.

deleted

Thank you. I updated to 1.4.0. And take the latest RunWhisper.cs code as a reference. It improves a lot. No pad op is needed now. And the 23s cost of it is removed.
But the other part still costs tens to hundreds of milliseconds. Any other suggestion to improve it?

Hi, for running on Android (depending on the device), 100ms per inference call is probably not too unreasonable - neural networks take a lot of resources. To improve the graphics framerate you can try this: https://docs.unity3d.com/Packages/[email protected]/manual/split-inference-over-multiple-frames.html
which spreads the cost over multiple frames. Other things you can try is compiling to ARM64.
Another trick is to do one or two inference calls at the start to "warm up" the neural network at the start of your application.
I see you posted the question to the discussions forum too which is good, and it looks like someone is going to take a look at it to see if there's any other improvements possible.

deleted

Thank you for your help. Could you please have a look at my another discussion here, about the OpenGL ES compatibility?
I can get correct result in Vulkan. But I can't get correct result for OpenGL ES. I think that it may be caused by the limitation of OpenGL ES compute shader.
Is there a workaround to make it working on OpenGL ES Android?

Sign up or log in to comment