Edit model card

Hercules-Mini-1.8B-GGUF

Quantized GGUF model files for Hercules-Mini-1.8B from M4-ai

Original Model Card:

Hercules-Mini-1.8B

We fine-tuned Qwen1.5-1.8B on Locutusque's Hercules-v4.

Model Details

Model Description

This model has capabilities in math, coding, function calling, roleplay, and more. We fine-tuned it using 700,000 examples of Hercules-v4.

  • Developed by: M4-ai
  • Language(s) (NLP): English and maybe Chinese
  • License: tongyi-qianwen license
  • Finetuned from model: Qwen1.5-1.8B

Uses

General purpose assistant, question answering, chain-of-thought, etc..

Bias, Risks, and Limitations

The eos token was not setup properly, so to prevent infinite generation you'll need to implement a stopping criteria when the model generates the <|im_end|> token.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Evaluation

Coming soon

Training Details

Training Data

https://huggingface.co./datasets/Locutusque/hercules-v4.0

Training Hyperparameters

  • Training regime: bf16 non-mixed precision

Technical Specifications

Hardware

We used 8 Kaggle TPUs, and we trained at a global batch size of 256 and sequence length of 1536

Contributions

Thanks to @Tonic, @aloobun, @fhai50032, and @Locutusque for their contributions to this model.

Downloads last month
170
GGUF
Model size
1.84B params
Architecture
qwen2

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for afrideva/Hercules-Mini-1.8B-GGUF

Quantized
(1)
this model

Dataset used to train afrideva/Hercules-Mini-1.8B-GGUF