krvhrv's picture
Adding Evaluation Results (#1)
3bf4de9 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - medical
  - biology
  - chemistry
  - text-generation-inference
datasets:
  - krvhrv/Healix-Medical-Shot
model-index:
  - name: Healix-1.1B-V1-Chat-dDPO
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 30.55
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=health360/Healix-1.1B-V1-Chat-dDPO
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 44.78
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=health360/Healix-1.1B-V1-Chat-dDPO
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 24.64
            name: accuracy
        source:
          url: >-
            https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=health360/Healix-1.1B-V1-Chat-dDPO
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 41.55
        source:
          url: >-
            https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=health360/Healix-1.1B-V1-Chat-dDPO
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 56.51
            name: accuracy
        source:
          url: >-
            https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=health360/Healix-1.1B-V1-Chat-dDPO
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 0
            name: accuracy
        source:
          url: >-
            https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard?query=health360/Healix-1.1B-V1-Chat-dDPO
          name: Open LLM Leaderboard

Healix 1.1B Model Card

Model Description

Healix 1.1B is a state-of-the-art large language model specifically designed for medical applications. With 1.1 billion parameters, it has been trained on a vast corpus of medical literature to provide accurate and reliable responses to complex medical queries. This model aims to assist healthcare professionals and researchers by offering insights derived from medical data.

Training Data

The model leverages an extensive compilation of medical literature, including research papers, clinical trial reports, and textbooks, ensuring a broad understanding of medical topics.

Intended Use

This model is designed for medical research, clinical support, and healthcare applications. It serves to enhance medical text generation, query response, and evidence-based information dissemination. It is not a substitute for professional medical consultation.

Limitations

While Healix 1.1B offers advanced medical insights, it has limitations in data quality and representativeness, and may inadvertently produce biased or incorrect information.

Performance

Healix 1.1B demonstrated a remarkable accuracy of 64%, outperforming the LLAMA 2 7B model, which achieved an accuracy of 62% despite its larger size of 7 billion parameters. This highlights Healix 1.1B's superior ability to handle real emergency-focused medical questions, showcasing the effectiveness of specialized training and architecture in domain-specific applications.

Ethical Considerations

Users are urged to use Healix 1.1B responsibly, considering the ethical implications, patient privacy, and data security. The model's outputs should be used as a supplementary information source alongside professional medical judgment.

Papers

Details on the development, training, and evaluation of Healix 1.1B will be available in our forthcoming publications, offering insights into its creation and the advancements it brings to medical informatics.

Input Format

Use the Alpaca model format.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 33.00
AI2 Reasoning Challenge (25-Shot) 30.55
HellaSwag (10-Shot) 44.78
MMLU (5-Shot) 24.64
TruthfulQA (0-shot) 41.55
Winogrande (5-shot) 56.51
GSM8k (5-shot) 0.00