|
--- |
|
license: llama3 |
|
language: |
|
- en |
|
- zh |
|
base_model: prithivMLmods/Deepthink-Llama-3-8B-Preview |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
tags: |
|
- text-generation-inference |
|
- llama-cpp |
|
- gguf-my-repo |
|
--- |
|
|
|
# Triangle104/Deepthink-Llama-3-8B-Preview-Q4_K_M-GGUF |
|
This model was converted to GGUF format from [`prithivMLmods/Deepthink-Llama-3-8B-Preview`](https://huggingface.co./prithivMLmods/Deepthink-Llama-3-8B-Preview) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co./spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co./prithivMLmods/Deepthink-Llama-3-8B-Preview) for more details on the model. |
|
|
|
--- |
|
The Deepthink-Llama-3-8B-Preview is a fine-tuned version of the Llama-3.1-8B base model, further enhanced with the Rethinking R1 Dataset Logits |
|
for superior text generation. This model is designed for advanced |
|
reasoning, structured problem-solving, and contextually rich outputs, |
|
making it an excellent choice for applications in education, programming, research, and creative writing. |
|
|
|
|
|
With its optimized architecture, Deepthink-Llama-3-8B-Preview excels at: |
|
|
|
|
|
Logical reasoning and step-by-step problem solving |
|
Mathematical and coding tasks, leveraging specialized expert models |
|
Generating long-form content (up to 8K tokens) with improved coherence |
|
Understanding structured data, including tables and JSON outputs |
|
Instruction following and adapting to diverse system prompts, making it ideal for chatbots and AI assistants |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Key Features |
|
|
|
|
|
|
|
|
|
Supports long-context processing of up to 128K tokens |
|
Multilingual capabilities for 29+ languages, including English, Chinese, Spanish, French, German, Arabic, and more |
|
Fine-tuned using Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Model Architecture |
|
|
|
|
|
|
|
|
|
Deepthink-Llama-3-8B-Preview is built on the optimized transformer architecture of Llama-3.1-8B, integrating enhanced dataset logits from Rethinking R1 for better contextual understanding and output quality. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use with transformers |
|
|
|
|
|
|
|
|
|
To run conversational inference using transformers >= 4.43.0, use the pipeline abstraction or leverage the generate() function with the Auto classes. |
|
|
|
|
|
Ensure your environment is updated with: |
|
|
|
|
|
pip install --upgrade transformers |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example Usage |
|
|
|
|
|
|
|
|
|
import torch |
|
from transformers import pipeline |
|
|
|
model_id = "prithivMLmods/Deepthink-Llama-3-8B-Preview" |
|
pipe = pipeline( |
|
"text-generation", |
|
model=model_id, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
) |
|
|
|
messages = [ |
|
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, |
|
{"role": "user", "content": "Who are you?"}, |
|
] |
|
|
|
outputs = pipe( |
|
messages, |
|
max_new_tokens=256, |
|
) |
|
print(outputs[0]["generated_text"][-1]) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Intended Use |
|
|
|
|
|
|
|
|
|
Deepthink-Llama-3-8B-Preview is designed for a wide |
|
range of applications requiring deep reasoning, structured outputs, and |
|
logical text generation. It is particularly suited for: |
|
|
|
|
|
Education & Research: Generating detailed explanations, step-by-step solutions, and structured academic content. |
|
Programming & Code Generation: Assisting in code writing, debugging, and algorithm explanations with improved logic structuring. |
|
AI Chatbots & Assistants: Providing context-aware, instruction-following responses for conversational AI applications. |
|
Creative Writing: Generating high-quality stories, articles, and structured narratives with coherence. |
|
Data Analysis & Structured Output Generation: Interpreting and generating JSON, tables, and formatted outputs for structured data processing. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Limitations |
|
|
|
|
|
|
|
|
|
While Deepthink-Llama-3-8B-Preview is optimized for deep reasoning and structured outputs, it has some limitations: |
|
|
|
|
|
Not a Real-time Knowledge Source |
|
|
|
|
|
The model is trained on a fixed dataset and does not have real-time |
|
internet access. It may not provide up-to-date information on rapidly |
|
evolving topics. |
|
|
|
|
|
Potential Biases |
|
|
|
|
|
As with all AI models, responses may reflect biases present in the |
|
training data. Users should critically evaluate outputs, especially in |
|
sensitive domains. |
|
|
|
|
|
Mathematical & Logical Reasoning Constraints |
|
|
|
|
|
While strong in step-by-step reasoning, it may occasionally produce |
|
incorrect mathematical calculations or logical inconsistencies. External |
|
verification is recommended for critical applications. |
|
|
|
|
|
Handling of Extremely Long Contexts |
|
|
|
|
|
While it supports up to 128K tokens, efficiency and coherence may degrade when processing very long documents or conversations. |
|
|
|
|
|
Limited Handling of Ambiguity |
|
|
|
|
|
The model may struggle with highly ambiguous or context-dependent |
|
queries, sometimes generating plausible but incorrect responses. |
|
|
|
|
|
Ethical & Compliance Considerations |
|
|
|
|
|
Not intended for generating misinformation, automating legal or |
|
medical decisions, or other high-risk applications without human |
|
oversight. |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/Deepthink-Llama-3-8B-Preview-Q4_K_M-GGUF --hf-file deepthink-llama-3-8b-preview-q4_k_m.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/Deepthink-Llama-3-8B-Preview-Q4_K_M-GGUF --hf-file deepthink-llama-3-8b-preview-q4_k_m.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/Deepthink-Llama-3-8B-Preview-Q4_K_M-GGUF --hf-file deepthink-llama-3-8b-preview-q4_k_m.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/Deepthink-Llama-3-8B-Preview-Q4_K_M-GGUF --hf-file deepthink-llama-3-8b-preview-q4_k_m.gguf -c 2048 |
|
``` |
|
|