llama-3-8b-instruct-262k-chinese

llama-3-8b-instruct-262k-chinese基于Llama-3-8B-Instruct-262k,使用ORPO方法,在中英文偏好数据集shibing624/DPO-En-Zh-20k-Preference 上微调得到的对话模型。

模型的部署、训练等方法详见MedicalGPT的GitHub仓库:https://github.com/shibing624/MedicalGPT

Relate models

Features

模型优势:

  1. 支持超长context length 262k token,适合RAG
  2. 支持中英文
  3. 支持多轮对话,代码编码、推理能力强,英文知识充分
  4. 模型推理需要显存:
Quantization Peak Usage for Encoding 2048 Tokens Peak Usage for Generating 8192 Tokens
FP16/BF16 18.66GB 24.58GB
Int4 9.21GB 14.62GB

缺点:

  1. model size只有8B,知识类问答幻觉明显
  2. 中文知识欠缺,容易幻觉,特别是中文古文知识,属于llama类模型通病

如何使用

import transformers
import torch

model_id = "shibing624/llama-3-8b-instruct-262k-chinese"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.float16},
    device="cuda",
)

messages = [{"role": "system", "content": ""}]
messages.append({"role": "user", "content": "介绍一下机器学习"})
prompt = pipeline.tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
    )
terminators = [
        pipeline.tokenizer.eos_token_id,
        pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
    ]
outputs = pipeline(
    prompt,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9
)
content = outputs[0]["generated_text"][len(prompt):]
print(content)

result:

机器学习(Machine Learning)是一种基于计算机算法的自动数据分析技术,用于从数据中学习并预测未来的结果。它是人工智能(AI)和数据挖掘(Data Mining)的子领域,旨在通过训练和调整算法来发现数据中的模式、关系和规律。

机器学习算法可以分为监督学习、无监督学习和半监督学习三类:

1. 监督学习(Supervised Learning):在这种类型的学习中,算法被提供带有标签的数据集,用于训练。算法学习如何将输入数据映射到输出数据,并在新数据上进行预测。常见的监督学习算法包括逻辑回归、决策树、支持向量机(SVM)、随机森林和神经网络。
2. 无监督学习(Unsupervised Learning):在这种类型的学习中,算法没有标签数据。算法学习数据中的模式、结构和关系,并可能发现新的数据集群或特征。常见的无监督学习算法包括聚类、主成分分析(PCA)、独立成分分析(ICA)和高维度数据降维。
3. 半监督学习(Semi-supervised Learning):在这种类型的学习中,算法被提供部分带有标签的数据集。算法学习如何将输入数据映射到输出数据,并在新数据上进行预测。半监督学习算法结合了监督学习和无监督学习的优点,常见的半监督学习算法包括自我标注(Self-Labeling)和基于图的半监督学习(Graph-based Semi-supervised Learning)。

机器学习的应用广泛,包括自然语言处理、计算机视觉、推荐系统、人工智能和自动驾驶等领域。它的优势包括:

1. 自动化:机器学习算法可以自动从数据中发现模式和关系,无需人为干预。
2. 高效性:机器学习算法可以处理大量数据,并且可以在不需要人为干预的情况下进行预测。
3. 适应性:机器学习算法可以根据数据集的变化和更新进行调整。
4. 精准性:机器学习算法可以通过训练和测试来提高预测的准确性。

train detail

train loss:

eval loss:

About Llama-3-8B-Instruct-262k

Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. To learn more or collaborate on a custom model.

This model extends LLama-3 8B's context length from 8k to -> 160K, developed by Gradient, sponsored by compute from Crusoe Energy. It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.

Approach:

  • meta-llama/Meta-Llama-3-8B-Instruct as the base
  • NTK-aware interpolation [1] to initialize an optimal schedule for RoPE theta, followed by a new data-driven RoPE theta optimization technique
  • Progressive training on increasing context lengths similar to the Large World Model [2] (See details below)

Infra:

We build on top of the EasyContext Blockwise RingAttention library [3] to scalably and efficiently train on contexts up to 262144 tokens on Crusoe Energy high performance L40S cluster.

Data:

For training data, we generate long contexts by augmenting SlimPajama.

Progressive Training Details:

Parameter 65K 262K
Initialize From LLaMA-3-8B-Inst 65K
Sequence Length 2^16 2^18
RoPE theta 15.3 M 207.1 M
Batch Size (Tokens / Step) 2.097 M 4.192 M
Steps 30 24
Total Tokens 63 M 101 M
Learning Rate 2.00E-05 2.00E-05
# GPUs 32 32
GPU Type NVIDIA L40S NVIDIA L40S
Downloads last month
75
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for shibing624/llama-3-8b-instruct-262k-chinese

Quantizations
3 models