levulinh's picture
Update README.md
afef841 verified
metadata
library_name: transformers
license: llama3.1
language:
  - ko
  - en
base_model:
  - meta-llama/Meta-Llama-3.1-8B-Instruct
model-index:
  - name: HelpyEdu
    results:
      - task:
          type: text-generation
        dataset:
          type: openai_humaneval
          name: HumanEval (Prompted)
        metrics:
          - name: pass@1
            type: pass@1
            value: 0.682
            verified: false

Model Card: Helpy-EDU-B-0916

Model Details

  • Model Name: Helpy-EDU-B-0916
  • Base Model: meta-lama/lama-3.1-8b-instruct
  • Model Size: 8 billion parameters
  • Model Type: Instruction-tuned Large Language Model (LLM)

Model Description

Helpy-EDU-B-0916 is a large language model fine-tuned to assist with educational tasks, focusing on safe and ethical conversations in both English and Korean. It is designed to provide accurate, helpful, and context-aware responses to instructional prompts, making it ideal for applications in education, tutoring, and content generation.

This model was fine-tuned from the base model meta-lama/lama-3.1-8b-instruct, leveraging high-quality data sources and optimized for multilingual environments.

Training Data

The model was fine-tuned using the following datasets:

  • AI Instructions from AI HUB: This dataset provides diverse AI-related instructions, enhancing the model’s ability to understand and follow detailed prompts.
  • Korean Safe Conversations: A curated dataset emphasizing safe, respectful, and culturally sensitive dialogues in Korean, ensuring the model adheres to ethical communication standards when interacting in Korean.

Intended Use

Helpy-EDU-B-0916 is tailored for the following use cases:

  • Educational Assistance: Responding to student queries, generating content for lessons, and aiding in language learning.
  • Bilingual Conversations: Supporting both English and Korean interactions with a focus on safety and appropriateness.
  • AI Instruction Following: Providing detailed and context-aware responses to instructional queries.

Limitations and Biases

  • Korean Language Proficiency: While the model has been fine-tuned with Korean safe conversations, it may still struggle with certain idiomatic or dialectal variations in the Korean language.
  • Instruction Bias: The model's responses are based on the instruction-tuning process, which may lead to occasional overconfidence in its answers, especially when faced with ambiguous or unfamiliar tasks.
  • Sensitive Content: While efforts have been made to minimize harmful or unsafe outputs, the model might still generate biased or incorrect responses in rare instances. Use in highly sensitive applications should be done with caution.

Model Repository

The model is hosted on Hugging Face: eliceai/helpy-edu-b-0916

License

This model follows the license provided by meta-lama, which is GNU General Public License v3.0. Please review and adhere to the licensing requirements before use.


Evaluation Benchmarks

The following benchmarks compare the Helpy-EDU-B-0916 checkpoint against the Llama 3.1 8B Instruct baseline model across multiple evaluation datasets:

Model Human Eval Human Eval + MMLU KMMLU KOBEST Chinese (↓)
Llama 3.1 8B Instruct (Baseline) 0.677 0.610 0.678 0.419 0.603 ~16%
Helpy-EDU-B-0916 (ckpt-0916) 0.680 0.620 0.673 0.399 0.568 0

Benchmark Descriptions:

  • Human Eval: Measures general instruction-following performance.
  • Human Eval +: Enhanced instruction-following task set for complex queries.
  • MMLU (Massive Multitask Language Understanding): A benchmark designed to evaluate multitask language understanding capabilities.
  • KMMLU: Korean variant of the MMLU benchmark, testing Korean-specific multitask understanding.
  • KOBEST: Evaluates performance on Korean-language understanding and safe conversation generation.
  • Chinese: Indicate the percentage of answers from the LLM infected with random Chinese characters, lower is better.

Disclaimer: The model is provided as-is, and users are responsible for its application. Please ensure ethical and responsible usage in all deployments.