A dynamic parallel method for performance optimization on hybrid CPUs Paper • 2411.19542 • Published Nov 29, 2024 • 5
Spatial Computing: Concept, Applications, Challenges and Future Directions Paper • 2402.07912 • Published Jan 30, 2024
From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation Paper • 2404.09138 • Published Apr 14, 2024 • 4
Introducing v0.5 of the AI Safety Benchmark from MLCommons Paper • 2404.12241 • Published Apr 18, 2024 • 11
TEQ: Trainable Equivalent Transformation for Quantization of LLMs Paper • 2310.10944 • Published Oct 17, 2023 • 10
Efficient Post-training Quantization with FP8 Formats Paper • 2309.14592 • Published Sep 26, 2023 • 11
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs Paper • 2309.05516 • Published Sep 11, 2023 • 10
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs Paper • 2306.16601 • Published Jun 28, 2023 • 4
Prune Once for All: Sparse Pre-Trained Language Models Paper • 2111.05754 • Published Nov 10, 2021 • 1
Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length Paper • 2111.09645 • Published Nov 18, 2021