arcee-intel (arcee-intel-colab)

Haihao

authored a paper 3 months ago

A dynamic parallel method for performance optimization on hybrid CPUs

Paper • 2411.19542 • Published Nov 29, 2024 • 5

Haihao

authored 3 papers 5 months ago

Alyosha11

authored 2 papers 9 months ago

Spatial Computing: Concept, Applications, Challenges and Future Directions

Paper • 2402.07912 • Published Jan 30, 2024

From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation

Paper • 2404.09138 • Published Apr 14, 2024 • 4

ashahba

authored a paper 11 months ago

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Paper • 2404.12241 • Published Apr 18, 2024 • 11

Haihao

authored 4 papers over 1 year ago

TEQ: Trainable Equivalent Transformation for Quantization of LLMs

Paper • 2310.10944 • Published Oct 17, 2023 • 10

Efficient Post-training Quantization with FP8 Formats

Paper • 2309.14592 • Published Sep 26, 2023 • 11

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 10

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

Paper • 2306.16601 • Published Jun 28, 2023 • 4

Haihao

authored 2 papers almost 2 years ago

QuaLA-MiniLM: a Quantized Length Adaptive MiniLM

Paper • 2210.17114 • Published Oct 31, 2022

Prune Once for All: Sparse Pre-Trained Language Models

Paper • 2111.05754 • Published Nov 10, 2021 • 1

kding1

authored a paper almost 2 years ago

Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length

Paper • 2111.09645 • Published Nov 18, 2021

arcee-intel-colab

AI & ML interests

arcee-intel's activity

A dynamic parallel method for performance optimization on hybrid CPUs

Efficient LLM Inference on CPUs

Effective Quantization for Diffusion Models on CPUs

Fast DistilBERT on CPUs

Spatial Computing: Concept, Applications, Challenges and Future Directions

From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language Representation

Introducing v0.5 of the AI Safety Benchmark from MLCommons

TEQ: Trainable Equivalent Transformation for Quantization of LLMs

Efficient Post-training Quantization with FP8 Formats

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

QuaLA-MiniLM: a Quantized Length Adaptive MiniLM

Prune Once for All: Sparse Pre-Trained Language Models

Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length

AI & ML interests

Team members 10

arcee-intel's activity