nm-testing/TinyLlama-1.1B-Chat-v1.0-W8A16_channel-e2e Text Generation • Updated about 4 hours ago • 497
nm-testing/TinyLlama-1.1B-Chat-v1.0-W4A16_channel-e2e Text Generation • Updated about 4 hours ago • 448
nm-testing/TinyLlama-1.1B-Chat-v1.0-actorder-weight-e2e Text Generation • Updated about 4 hours ago • 550
nm-testing/TinyLlama-1.1B-Chat-v1.0-actorder-group-e2e Text Generation • Updated about 4 hours ago • 513
nm-testing/TinyLlama-1.1B-Chat-v1.0-W4A16_2of4_channel-e2e Text Generation • Updated about 4 hours ago • 493
nm-testing/TinyLlama-1.1B-Chat-v1.0-W8A8_tensor_weight_static_per_tensor_act-e2e Text Generation • Updated about 5 hours ago • 541
nm-testing/TinyLlama-1.1B-Chat-v1.0-W8A8_channel_weight_static_per_tensor-e2e Text Generation • Updated about 5 hours ago • 423
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models Paper • 2203.07259 • Published Mar 14, 2022 • 3
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment Paper • 2405.03594 • Published May 6 • 7
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4 • 46
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4 • 46
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4 • 46
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence Paper • 2405.15593 • Published May 24 • 1