Athene-V2-Chat AWQ 4-Bit Quantized Version
This repository provides the AWQ 4-bit quantized version of the Athene-V2-Chat model, originally developed by Nexusflow. This model's weights are padded with zeros before quantization to ensure compatibility with multi-GPU tensor parallelism by resolving divisibility constraints. The padding minimally impacts computation while enabling efficient scaling across multiple GPUs.
- Downloads last month
- 1,241
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.