Andrew's picture

Andrew

sealad886
·

AI & ML interests

None yet

Recent Activity

Organizations

MLX Community's profile picture

sealad886's activity

New activity in mlx-community/DeepSeek-R1-Distill-Qwen-1.5B-MLX about 3 hours ago

Multiple quants for MLX framework

#1 opened about 3 hours ago by
sealad886
New activity in mlx-community/DeepSeek-R1-Distill-Qwen-3B-MLX about 3 hours ago

Multiple quants for MLX framework

#1 opened about 3 hours ago by
sealad886
New activity in mlx-community/DeepSeek-R1-Distill-Qwen-7B-MLX about 4 hours ago

Multiple quants for MLX framework

#1 opened about 4 hours ago by
sealad886
New activity in mlx-community/DeepSeek-R1-Distill-Qwen-14B-MLX about 4 hours ago

Multiple quants for MLX framework

#2 opened about 4 hours ago by
sealad886

Multiple quants for MLX framework

#1 opened about 8 hours ago by
sealad886
New activity in mlx-community/DeepSeek-R1-Distill-Qwen-32B-MLX about 6 hours ago

Multiple quants for MLX framework

#23 opened about 8 hours ago by
sealad886
New activity in mlx-community/DeepSeek-R1-Distill-Qwen-32B-MLX about 8 hours ago

Multiple quants for MLX framework

#17 opened about 10 hours ago by
sealad886
New activity in mlx-community/DeepSeek-R1-Distill-Qwen-32B-MLX about 10 hours ago

Parent PR: Add Multi-Quantization Support for DeepSeek-R1-Distill-Qwen-32B via MLX_LM This PR introduces a new conversion pipeline that generates multiple quantized variants of **DeepSeek-R1-Distill-Qwen-32B** using the **MLX_LM** tool. Unlike previous methods based on llama.cpp, this implementation leverages MLX_LM’s unique `quant_predicate` configuration to produce high‑quality mixed‑bit quantizations optimized specifically for MLX runs. Key Changes and Features: - **MLX_LM-Based Conversion:** All model conversions are performed using MLX_LM, which uses parameters like `q_bits`, `q_group_size`, and the distinctive `quant_predicate` (e.g., `"mixed_3_6"`, `"mixed_2_6"`) to create finely tuned quantized models. This provides a superior balance between quality and performance tailored for MLX inference. - **Asynchronous Workflow:** The new pipeline supports asynchronous conversion and upload tasks. Each quantized variant is generated concurrently and then uploaded to the designated Hugging Face repository, streamlining the overall process. - **Updated Documentation:** The repository’s README has been fully updated to reflect the MLX_LM conversion process, providing clear instructions on prompt formatting, downloading individual variants, and running the models with MLX. The documentation emphasizes that these quantizations are for MLX runs only and are not intended for general GGUF deployments. - **Enhanced User Flexibility:** With multiple quantization options (including bf16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, IQ4_NL, etc.), users can select the variant that best meets their hardware and performance requirements. Detailed usage and download instructions facilitate easy deployment. Benefits: - **Optimized for MLX Runs:** The generated quantized models are designed specifically for MLX inference, ensuring optimal performance and compatibility with MLX’s specialized runtime. - **Scalability and Future-Proofing:** This modular pipeline allows for easy integration of additional quantization recipes and future enhancements while keeping the conversion process aligned with MLX_LM’s capabilities. - **Comprehensive Documentation:** The updated README and model card provide thorough guidance on model usage, including prompt format, download instructions, and hardware-specific recommendations. This PR represents a significant advancement in making DeepSeek-R1-Distill-Qwen-32B more accessible and versatile for MLX users. It establishes a parent PR that will be referenced by every subsequent quantized model upload, ensuring consistency and traceability across all releases. Feedback and suggestions are welcome.

#22 opened about 10 hours ago by
sealad886

Parent PR: Add Multi-Quantization Support for DeepSeek-R1-Distill-Qwen-32B via MLX_LM This PR introduces a new conversion pipeline that generates multiple quantized variants of **DeepSeek-R1-Distill-Qwen-32B** using the **MLX_LM** tool. Unlike previous methods based on llama.cpp, this implementation leverages MLX_LM’s unique `quant_predicate` configuration to produce high‑quality mixed‑bit quantizations optimized specifically for MLX runs. Key Changes and Features: - **MLX_LM-Based Conversion:** All model conversions are performed using MLX_LM, which uses parameters like `q_bits`, `q_group_size`, and the distinctive `quant_predicate` (e.g., `"mixed_3_6"`, `"mixed_2_6"`) to create finely tuned quantized models. This provides a superior balance between quality and performance tailored for MLX inference. - **Asynchronous Workflow:** The new pipeline supports asynchronous conversion and upload tasks. Each quantized variant is generated concurrently and then uploaded to the designated Hugging Face repository, streamlining the overall process. - **Updated Documentation:** The repository’s README has been fully updated to reflect the MLX_LM conversion process, providing clear instructions on prompt formatting, downloading individual variants, and running the models with MLX. The documentation emphasizes that these quantizations are for MLX runs only and are not intended for general GGUF deployments. - **Enhanced User Flexibility:** With multiple quantization options (including bf16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, IQ4_NL, etc.), users can select the variant that best meets their hardware and performance requirements. Detailed usage and download instructions facilitate easy deployment. Benefits: - **Optimized for MLX Runs:** The generated quantized models are designed specifically for MLX inference, ensuring optimal performance and compatibility with MLX’s specialized runtime. - **Scalability and Future-Proofing:** This modular pipeline allows for easy integration of additional quantization recipes and future enhancements while keeping the conversion process aligned with MLX_LM’s capabilities. - **Comprehensive Documentation:** The updated README and model card provide thorough guidance on model usage, including prompt format, download instructions, and hardware-specific recommendations. This PR represents a significant advancement in making DeepSeek-R1-Distill-Qwen-32B more accessible and versatile for MLX users. It establishes a parent PR that will be referenced by every subsequent quantized model upload, ensuring consistency and traceability across all releases. Feedback and suggestions are welcome.

#21 opened about 10 hours ago by
sealad886

Parent PR: Add Multi-Quantization Support for DeepSeek-R1-Distill-Qwen-32B via MLX_LM This PR introduces a new conversion pipeline that generates multiple quantized variants of **DeepSeek-R1-Distill-Qwen-32B** using the **MLX_LM** tool. Unlike previous methods based on llama.cpp, this implementation leverages MLX_LM’s unique `quant_predicate` configuration to produce high‑quality mixed‑bit quantizations optimized specifically for MLX runs. Key Changes and Features: - **MLX_LM-Based Conversion:** All model conversions are performed using MLX_LM, which uses parameters like `q_bits`, `q_group_size`, and the distinctive `quant_predicate` (e.g., `"mixed_3_6"`, `"mixed_2_6"`) to create finely tuned quantized models. This provides a superior balance between quality and performance tailored for MLX inference. - **Asynchronous Workflow:** The new pipeline supports asynchronous conversion and upload tasks. Each quantized variant is generated concurrently and then uploaded to the designated Hugging Face repository, streamlining the overall process. - **Updated Documentation:** The repository’s README has been fully updated to reflect the MLX_LM conversion process, providing clear instructions on prompt formatting, downloading individual variants, and running the models with MLX. The documentation emphasizes that these quantizations are for MLX runs only and are not intended for general GGUF deployments. - **Enhanced User Flexibility:** With multiple quantization options (including bf16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, IQ4_NL, etc.), users can select the variant that best meets their hardware and performance requirements. Detailed usage and download instructions facilitate easy deployment. Benefits: - **Optimized for MLX Runs:** The generated quantized models are designed specifically for MLX inference, ensuring optimal performance and compatibility with MLX’s specialized runtime. - **Scalability and Future-Proofing:** This modular pipeline allows for easy integration of additional quantization recipes and future enhancements while keeping the conversion process aligned with MLX_LM’s capabilities. - **Comprehensive Documentation:** The updated README and model card provide thorough guidance on model usage, including prompt format, download instructions, and hardware-specific recommendations. This PR represents a significant advancement in making DeepSeek-R1-Distill-Qwen-32B more accessible and versatile for MLX users. It establishes a parent PR that will be referenced by every subsequent quantized model upload, ensuring consistency and traceability across all releases. Feedback and suggestions are welcome.

#22 opened about 10 hours ago by
sealad886