A pure C++ implementation, support CUDA, CPU, OpenCL etc.
#17
by
zhaode
- opened
https://github.com/wangzhaode/ChatGLM-MNN
- Pure
C++
. - Just depende
MNN
, support multi device and easy deploy. - Split model to 28 block to use different device.
- Slim vocab from
150528
to130528
. - Faster than
Pytorch
implementation. - Provide
CLI
andWEB
demo. - Support
Android
device forward.