Andrew's picture

68 2

Andrew

sealad886

·

sealad886

AI & ML interests

None yet

Recent Activity

new activity about 2 hours ago

mlx-community/DeepSeek-R1-Distill-Qwen-32B-MLX:Multiple quants for MLX framework

new activity about 2 hours ago

mlx-community/DeepSeek-R1-Distill-Qwen-32B-MLX:How to get these new quantinized version of model in LMStudio?

updated a model about 3 hours ago

mlx-community/DeepSeek-R1-Distill-Qwen-7B-MLX

View all activity

Organizations

sealad886's activity

New activity in mlx-community/DeepSeek-R1-Distill-Qwen-32B-MLX about 2 hours ago

Multiple quants for MLX framework

#25 opened about 4 hours ago by

How to get these new quantinized version of model in LMStudio?

#24 opened about 6 hours ago by

updated 2 models about 3 hours ago

mlx-community/DeepSeek-R1-Distill-Qwen-7B-MLX

Text Generation • Updated about 3 hours ago

mlx-community/DeepSeek-R1-Distill-Qwen-1.5B-MLX

Text Generation • Updated about 3 hours ago

New activity in mlx-community/DeepSeek-R1-Distill-Qwen-1.5B-MLX about 3 hours ago

Multiple quants for MLX framework

#1 opened about 3 hours ago by

published a model about 3 hours ago

mlx-community/DeepSeek-R1-Distill-Qwen-1.5B-MLX

Text Generation • Updated about 3 hours ago

New activity in mlx-community/DeepSeek-R1-Distill-Qwen-3B-MLX about 3 hours ago

Multiple quants for MLX framework

#1 opened about 3 hours ago by

published a model about 3 hours ago

mlx-community/DeepSeek-R1-Distill-Qwen-3B-MLX

Updated about 3 hours ago

updated a model about 4 hours ago

mlx-community/DeepSeek-R1-Distill-Qwen-14B-MLX

Text Generation • Updated about 4 hours ago

New activity in mlx-community/DeepSeek-R1-Distill-Qwen-7B-MLX about 4 hours ago

Multiple quants for MLX framework

#1 opened about 4 hours ago by

published a model about 4 hours ago

mlx-community/DeepSeek-R1-Distill-Qwen-7B-MLX

Text Generation • Updated about 3 hours ago

updated a model about 4 hours ago

mlx-community/DeepSeek-R1-Distill-Qwen-32B-MLX

Text Generation • Updated about 2 hours ago • 22

New activity in mlx-community/DeepSeek-R1-Distill-Qwen-14B-MLX about 4 hours ago

Multiple quants for MLX framework

#2 opened about 4 hours ago by

Multiple quants for MLX framework

#1 opened about 8 hours ago by

New activity in mlx-community/DeepSeek-R1-Distill-Qwen-32B-MLX about 6 hours ago

Multiple quants for MLX framework

#23 opened about 8 hours ago by

published a model about 8 hours ago

mlx-community/DeepSeek-R1-Distill-Qwen-14B-MLX

Text Generation • Updated about 4 hours ago

New activity in mlx-community/DeepSeek-R1-Distill-Qwen-32B-MLX about 8 hours ago

Multiple quants for MLX framework

#17 opened about 10 hours ago by

New activity in mlx-community/DeepSeek-R1-Distill-Qwen-32B-MLX about 10 hours ago

Parent PR: Add Multi-Quantization Support for DeepSeek-R1-Distill-Qwen-32B via MLX_LM This PR introduces a new conversion pipeline that generates multiple quantized variants of DeepSeek-R1-Distill-Qwen-32B using the MLX_LM tool. Unlike previous methods based on llama.cpp, this implementation leverages MLX_LM’s unique `quant_predicate` configuration to produce high‑quality mixed‑bit quantizations optimized specifically for MLX runs. Key Changes and Features: - MLX_LM-Based Conversion: All model conversions are performed using MLX_LM, which uses parameters like `q_bits`, `q_group_size`, and the distinctive `quant_predicate` (e.g., `"mixed_3_6"`, `"mixed_2_6"`) to create finely tuned quantized models. This provides a superior balance between quality and performance tailored for MLX inference. - Asynchronous Workflow: The new pipeline supports asynchronous conversion and upload tasks. Each quantized variant is generated concurrently and then uploaded to the designated Hugging Face repository, streamlining the overall process. - Updated Documentation: The repository’s README has been fully updated to reflect the MLX_LM conversion process, providing clear instructions on prompt formatting, downloading individual variants, and running the models with MLX. The documentation emphasizes that these quantizations are for MLX runs only and are not intended for general GGUF deployments. - Enhanced User Flexibility: With multiple quantization options (including bf16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, IQ4_NL, etc.), users can select the variant that best meets their hardware and performance requirements. Detailed usage and download instructions facilitate easy deployment. Benefits: - Optimized for MLX Runs: The generated quantized models are designed specifically for MLX inference, ensuring optimal performance and compatibility with MLX’s specialized runtime. - Scalability and Future-Proofing: This modular pipeline allows for easy integration of additional quantization recipes and future enhancements while keeping the conversion process aligned with MLX_LM’s capabilities. - Comprehensive Documentation: The updated README and model card provide thorough guidance on model usage, including prompt format, download instructions, and hardware-specific recommendations. This PR represents a significant advancement in making DeepSeek-R1-Distill-Qwen-32B more accessible and versatile for MLX users. It establishes a parent PR that will be referenced by every subsequent quantized model upload, ensuring consistency and traceability across all releases. Feedback and suggestions are welcome.

#22 opened about 10 hours ago by

Parent PR: Add Multi-Quantization Support for DeepSeek-R1-Distill-Qwen-32B via MLX_LM This PR introduces a new conversion pipeline that generates multiple quantized variants of DeepSeek-R1-Distill-Qwen-32B using the MLX_LM tool. Unlike previous methods based on llama.cpp, this implementation leverages MLX_LM’s unique `quant_predicate` configuration to produce high‑quality mixed‑bit quantizations optimized specifically for MLX runs. Key Changes and Features: - MLX_LM-Based Conversion: All model conversions are performed using MLX_LM, which uses parameters like `q_bits`, `q_group_size`, and the distinctive `quant_predicate` (e.g., `"mixed_3_6"`, `"mixed_2_6"`) to create finely tuned quantized models. This provides a superior balance between quality and performance tailored for MLX inference. - Asynchronous Workflow: The new pipeline supports asynchronous conversion and upload tasks. Each quantized variant is generated concurrently and then uploaded to the designated Hugging Face repository, streamlining the overall process. - Updated Documentation: The repository’s README has been fully updated to reflect the MLX_LM conversion process, providing clear instructions on prompt formatting, downloading individual variants, and running the models with MLX. The documentation emphasizes that these quantizations are for MLX runs only and are not intended for general GGUF deployments. - Enhanced User Flexibility: With multiple quantization options (including bf16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, IQ4_NL, etc.), users can select the variant that best meets their hardware and performance requirements. Detailed usage and download instructions facilitate easy deployment. Benefits: - Optimized for MLX Runs: The generated quantized models are designed specifically for MLX inference, ensuring optimal performance and compatibility with MLX’s specialized runtime. - Scalability and Future-Proofing: This modular pipeline allows for easy integration of additional quantization recipes and future enhancements while keeping the conversion process aligned with MLX_LM’s capabilities. - Comprehensive Documentation: The updated README and model card provide thorough guidance on model usage, including prompt format, download instructions, and hardware-specific recommendations. This PR represents a significant advancement in making DeepSeek-R1-Distill-Qwen-32B more accessible and versatile for MLX users. It establishes a parent PR that will be referenced by every subsequent quantized model upload, ensuring consistency and traceability across all releases. Feedback and suggestions are welcome.

#21 opened about 10 hours ago by

Parent PR: Add Multi-Quantization Support for DeepSeek-R1-Distill-Qwen-32B via MLX_LM This PR introduces a new conversion pipeline that generates multiple quantized variants of DeepSeek-R1-Distill-Qwen-32B using the MLX_LM tool. Unlike previous methods based on llama.cpp, this implementation leverages MLX_LM’s unique `quant_predicate` configuration to produce high‑quality mixed‑bit quantizations optimized specifically for MLX runs. Key Changes and Features: - MLX_LM-Based Conversion: All model conversions are performed using MLX_LM, which uses parameters like `q_bits`, `q_group_size`, and the distinctive `quant_predicate` (e.g., `"mixed_3_6"`, `"mixed_2_6"`) to create finely tuned quantized models. This provides a superior balance between quality and performance tailored for MLX inference. - Asynchronous Workflow: The new pipeline supports asynchronous conversion and upload tasks. Each quantized variant is generated concurrently and then uploaded to the designated Hugging Face repository, streamlining the overall process. - Updated Documentation: The repository’s README has been fully updated to reflect the MLX_LM conversion process, providing clear instructions on prompt formatting, downloading individual variants, and running the models with MLX. The documentation emphasizes that these quantizations are for MLX runs only and are not intended for general GGUF deployments. - Enhanced User Flexibility: With multiple quantization options (including bf16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, IQ4_NL, etc.), users can select the variant that best meets their hardware and performance requirements. Detailed usage and download instructions facilitate easy deployment. Benefits: - Optimized for MLX Runs: The generated quantized models are designed specifically for MLX inference, ensuring optimal performance and compatibility with MLX’s specialized runtime. - Scalability and Future-Proofing: This modular pipeline allows for easy integration of additional quantization recipes and future enhancements while keeping the conversion process aligned with MLX_LM’s capabilities. - Comprehensive Documentation: The updated README and model card provide thorough guidance on model usage, including prompt format, download instructions, and hardware-specific recommendations. This PR represents a significant advancement in making DeepSeek-R1-Distill-Qwen-32B more accessible and versatile for MLX users. It establishes a parent PR that will be referenced by every subsequent quantized model upload, ensuring consistency and traceability across all releases. Feedback and suggestions are welcome.

#22 opened about 10 hours ago by