How is the inference so fast in this free hardware space?

#1
by mahiatlinux - opened

How is the inference so fast in this free hardware space?

because that's advantage of this arch.
you really using like 2.7B to generate token

Qwen org

Haha, it uses an API service; not actually running in this free hardwarse space.

jklj077 changed discussion status to closed

Sign up or log in to comment