smpanaro
/

coreml-joint-compression-test

Model card Files Files and versions Community

coreml-joint-compression-test / README.md

smpanaro's picture

Update README.md

ea8ec06 verified 3 months ago

|

history blame contribute delete

1.33 kB

	---
	license: mit
	---
	Series of models to test the benefits of CoreML joint compression on iOS 18/macOS 15.

	# mlp-*.mlpackage
	Simple Up/Gate/Silu/Down MLP repeated four times with the Llama 2 7B dimensions.

	All using 'CPU and Neural Engine' compute unit, measured in Xcode.

	\|Device\|Model \|Precision \|Minimum (ms)\|Median (ms)\|
	\|:-- \|:-- \|:-- \|--: \|--: \|
	\|M1 Max\|mlp-float16 \|float16 \|19.30 \|19.42 \|
	\|M1 Max\|mlp-4bit \|4-bit LUT \|5.93 \|5.98 \|
	\|M1 Max\|mlp-2bit \|2-bit LUT \|5.92 \|6.11 \|
	\|M1 Max\|mlp-4bit-int8\|4-bit int8 LUT + A8\|6.02 \|6.31 \|
	\|M1 Max\|mlp-2bit-int8\|2-bit int8 LUT + A8\|6.00 \|6.18 \|
	\|M1 Max\|mlp-int8-int8\|W8A8 \|9.78 \|9.94 \|
	\|M4 \|mlp-4bit \|4-bit LUT \|- \|4.19 \|
	\|M4 \|mlp-2bit \|2-bit LUT \|- \|3.83 \|
	\|M4 \|mlp-4bit-int8\|4-bit int8 LUT + A8\|- \|4.14 \|
	\|M4 \|mlp-2bit-int8\|2-bit int8 LUT + A8\|- \|3.83 \|
	\|M4 \|mlp-int8-int8\|W8A8 \|- \|8.18 \|


	# Download
	```
	huggingface-cli download \
	--local-dir . \
	--local-dir-use-symlinks False \
	smpanaro/coreml-joint-compression-test \
	--include ".mlpackage/"
	```