IBI-CAAI
/

MELT-Mixtral-8x7B-Instruct-v0.1

@@ -8,7 +8,7 @@ library_name: transformers
 The MELT-Mixtral-8x7B-Instruct-v0.1 Large Language Model (LLM) is a pretrained generative text model pre-trained and fine-tuned on using publically avalable medical data.
-MELT-Mixtral-8x7B-Instruct-v0.1 is 68.2% accurate across 3 USMLE benchmarks, surpassing the pass mark (>60%) in the U.S. Medical Licensing Examination (USMLE) style questions.
 To the best of our understanding our model is 4% less accurate than Google's 540 billion parameter [Med-Palm](https://sites.research.google/med-palm/), which is 10X larger.
@@ -16,7 +16,7 @@ To the best of our understanding our model is 4% less accurate than Google's 540
 The Medical Education Language Transformer (MELT) models have been trained on a wide-range of text, chat, Q/A, and instruction data in the medical domain.
-While the model was evaluated using publically avalable [USMLE](https://www.usmle.org/) example questions, its use it intented to be more broadly applicable.
 ### Model Description
@@ -88,10 +88,10 @@ The following datasets were used for training:
 <!-- This section describes the evaluation protocols and provides the results. -->
-MELT-Mixtral-8x7B-Instruct-v0.1 demonstrated a average 4.42% improvement over Mixtral-8x7B-Instruct-v0.1 across 3 USMLE benchmarks.
 The base Mixtral-8x7B-Instruct-v0.1 model already performs signifigantly better (65.31%) than the base [llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) model (35.26%) and our [MELT-llama-2-7b-chat-v0.1](https://huggingface.co/IBI-CAAI/MELT-llama-2-7b-chat-v0.1) model (46.33%).
-While there was limited improvement on the USMLE benchmarks, our training data contained a broad collection of medical text, chats, and multi-choice questions that would not be captured on the USMLE evaluation.
 ### Mixtral-8x7B-Instruct-v0.1

 The MELT-Mixtral-8x7B-Instruct-v0.1 Large Language Model (LLM) is a pretrained generative text model pre-trained and fine-tuned on using publically avalable medical data.
+MELT-Mixtral-8x7B-Instruct-v0.1 is 68.2% accurate across 3 USMLE, Indian AIIMS, and NEET medical examination benchmarks, surpassing the pass mark (>60%) in the U.S. Medical Licensing Examination (USMLE) style questions.
 To the best of our understanding our model is 4% less accurate than Google's 540 billion parameter [Med-Palm](https://sites.research.google/med-palm/), which is 10X larger.
 The Medical Education Language Transformer (MELT) models have been trained on a wide-range of text, chat, Q/A, and instruction data in the medical domain.
+While the model was evaluated using publically avalable [USMLE](https://www.usmle.org/), Indian AIIMS, and NEET example questions, its use it intented to be more broadly applicable.
 ### Model Description
 <!-- This section describes the evaluation protocols and provides the results. -->
+MELT-Mixtral-8x7B-Instruct-v0.1 demonstrated a average 4.42% improvement over Mixtral-8x7B-Instruct-v0.1 across 3 benchmarks.
 The base Mixtral-8x7B-Instruct-v0.1 model already performs signifigantly better (65.31%) than the base [llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) model (35.26%) and our [MELT-llama-2-7b-chat-v0.1](https://huggingface.co/IBI-CAAI/MELT-llama-2-7b-chat-v0.1) model (46.33%).
+While there was limited improvement on the benchmarks, our training data contained a broad collection of medical text, chats, and multi-choice questions that would not be captured by the multi-choice evaluations.
 ### Mixtral-8x7B-Instruct-v0.1