Speed comparison with fastertransformer for ChatGLM?

#10
by xiangli - opened

Hi, great work and thanks for sharing.
According to the post(https://mp.weixin.qq.com/s/uV4Y_q4GnTUAsRVHxJGxGA), the inference code is based on FT, and customization has been made for speed.
Can you kindly share the speed comparison between FT and lyraCharGLM?
Thanks.

Tencent Music Entertainment Lyra Lab org
edited May 25, 2023

@xiangli Hi, original FT doesn't naturally support ChatGLM ( different op behaviors), we're still working on fix all these problems and will report a pure FT version speed later.

Tencent Music Entertainment Lyra Lab org

@xiangli We have updated to a new accelerated version and removed the previous TensorRT acceleration version. The new version has undergone significant optimization at the source code level, resulting in improved performance, ease of use, and GPU compatibility. Please update and feel free to try it out.

vanewu changed discussion status to closed

Sign up or log in to comment