---
library_name: transformers
license: llama3.1
language:
- ko
- en
base_model:
- meta-llama/Meta-Llama-3.1-8B-Instruct
model-index:
- name: HelpyEdu
  results:
  - task:
      type: text-generation
    dataset:
      type: openai_humaneval
      name: HumanEval (Prompted)
    metrics:
    - name: pass@1
      type: pass@1
      value: 0.682
      verified: false
---
# Model Card: Helpy-EDU-B-0916

## Model Details
- **Model Name**: Helpy-EDU-B-0916
- **Base Model**: [meta-lama/lama-3.1-8b-instruct](https://huggingface.co./meta-lama/lama-3.1-8b-instruct)
- **Model Size**: 8 billion parameters
- **Model Type**: Instruction-tuned Large Language Model (LLM)

## Model Description
Helpy-EDU-B-0916 is a large language model fine-tuned to assist with educational tasks, focusing on safe and ethical conversations in both English and Korean. It is designed to provide accurate, helpful, and context-aware responses to instructional prompts, making it ideal for applications in education, tutoring, and content generation. 

This model was fine-tuned from the base model [meta-lama/lama-3.1-8b-instruct](https://huggingface.co./meta-lama/lama-3.1-8b-instruct), leveraging high-quality data sources and optimized for multilingual environments.

## Training Data
The model was fine-tuned using the following datasets:
- **AI Instructions from AI HUB**: This dataset provides diverse AI-related instructions, enhancing the model’s ability to understand and follow detailed prompts.
- **Korean Safe Conversations**: A curated dataset emphasizing safe, respectful, and culturally sensitive dialogues in Korean, ensuring the model adheres to ethical communication standards when interacting in Korean.

## Intended Use
Helpy-EDU-B-0916 is tailored for the following use cases:
- **Educational Assistance**: Responding to student queries, generating content for lessons, and aiding in language learning.
- **Bilingual Conversations**: Supporting both English and Korean interactions with a focus on safety and appropriateness.
- **AI Instruction Following**: Providing detailed and context-aware responses to instructional queries.

## Limitations and Biases
- **Korean Language Proficiency**: While the model has been fine-tuned with Korean safe conversations, it may still struggle with certain idiomatic or dialectal variations in the Korean language.
- **Instruction Bias**: The model's responses are based on the instruction-tuning process, which may lead to occasional overconfidence in its answers, especially when faced with ambiguous or unfamiliar tasks.
- **Sensitive Content**: While efforts have been made to minimize harmful or unsafe outputs, the model might still generate biased or incorrect responses in rare instances. Use in highly sensitive applications should be done with caution.

## Model Repository
The model is hosted on Hugging Face: [eliceai/helpy-edu-b-0916](https://huggingface.co./eliceai/helpy-edu-b-0916)

## License
This model follows the license provided by [meta-lama](https://huggingface.co./meta-lama/lama-3.1-8b-instruct), which is [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html). Please review and adhere to the licensing requirements before use.

---

## Evaluation Benchmarks

The following benchmarks compare the **Helpy-EDU-B-0916** checkpoint against the **Llama 3.1 8B Instruct** baseline model across multiple evaluation datasets:

| **Model**                  | **Human Eval** | **Human Eval +** | **MMLU** | **KMMLU** | **KOBEST** | **Chinese (↓)** |
|----------------------------|----------------|------------------|----------|-----------|------------|-----------------|
| Llama 3.1 8B Instruct (Baseline) | 0.677          | 0.610            | 0.678    | 0.419     | 0.603      | ~16%            |
| Helpy-EDU-B-0916 (ckpt-0916)     | 0.680          | 0.620            | 0.673    | 0.399     | 0.568      | 0               |

### Benchmark Descriptions:
- **Human Eval**: Measures general instruction-following performance.
- **Human Eval +**: Enhanced instruction-following task set for complex queries.
- **MMLU (Massive Multitask Language Understanding)**: A benchmark designed to evaluate multitask language understanding capabilities.
- **KMMLU**: Korean variant of the MMLU benchmark, testing Korean-specific multitask understanding.
- **KOBEST**: Evaluates performance on Korean-language understanding and safe conversation generation.
- **Chinese**: Indicate the percentage of answers from the LLM infected with random Chinese characters, lower is better.

---

**Disclaimer**: The model is provided as-is, and users are responsible for its application. Please ensure ethical and responsible usage in all deployments.