afrideva's picture
Upload README.md with huggingface_hub
1950d23 verified
|
raw
history blame
2.56 kB
metadata
base_model: M4-ai/Hercules-Mini-1.8B
datasets:
  - Locutusque/hercules-v4.0
inference: true
language:
  - en
library_name: transformers
license: other
model_creator: M4-ai
model_name: Hercules-Mini-1.8B
pipeline_tag: text-generation
quantized_by: afrideva
tags:
  - gguf
  - ggml
  - quantized

Hercules-Mini-1.8B-GGUF

Quantized GGUF model files for Hercules-Mini-1.8B from M4-ai

Original Model Card:

Hercules-Mini-1.8B

We fine-tuned Qwen1.5-1.8B on Locutusque's Hercules-v4.

Model Details

Model Description

This model has capabilities in math, coding, function calling, roleplay, and more. We fine-tuned it using 700,000 examples of Hercules-v4.

  • Developed by: M4-ai
  • Language(s) (NLP): English and maybe Chinese
  • License: tongyi-qianwen license
  • Finetuned from model: Qwen1.5-1.8B

Uses

General purpose assistant, question answering, chain-of-thought, etc..

Bias, Risks, and Limitations

The eos token was not setup properly, so to prevent infinite generation you'll need to implement a stopping criteria when the model generates the <|im_end|> token.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Evaluation

Coming soon

Training Details

Training Data

https://huggingface.co./datasets/Locutusque/hercules-v4.0

Training Hyperparameters

  • Training regime: bf16 non-mixed precision

Technical Specifications

Hardware

We used 8 Kaggle TPUs, and we trained at a global batch size of 256 and sequence length of 1536

Contributions

Thanks to @Tonic, @aloobun, @fhai50032, and @Locutusque for their contributions to this model.