Tulpar-7b-v0 / README.md
ekurtulus's picture
Update README.md
7caa5fc
metadata
license: llama2
language:
  - en
library_name: transformers
thumbnail: https://huggingface.co./HyperbeeAI/Tulpar-7b-v0/resolve/main/tulpar.png

Model Description

Tulpar-7b is a LLama2-7b-based model trained by HyperbeeAI. Training is done on a filtered and preprocessed instruction finetuning dataset that includes GPT-4 generated and generally curated datasets like Airoboros and Platypus.

Example Usage

Loading the model:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("HyperbeeAI/Tulpar-7b-v0")
model = AutoModelForCausalLM.from_pretrained("HyperbeeAI/Tulpar-7b-v0", device_map="auto")

You can run inference with both of the following prompts:

input_text="What is deep learning?"
prompt = f"### User: {input_text}\n\n### Assistant:\n"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, do_sample=True, top_p=0.95, top_k=0, max_new_tokens=512)
print(tokenizer.decode(output[0]))
input_text="What is deep learning?"
prompt = f"Question: {input_text}\n\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, do_sample=True, top_p=0.95, top_k=0, max_new_tokens=512)
print(tokenizer.decode(output[0]))

Evaluation

Our offline HF Leaderboard evaluation results:

Task Metric Value
arc_challenge acc_norm 0.5614
hellaswag acc_norm 0.7901
mmlu acc_norm 0.5242
truthfulqa_mc mc2 0.5160
Average - 0.5979

Other GPT4All evaluation results:

Task Metric Value
boolq acc 0.8306
piqa acc 0.7905
acc_norm 0.7884
winogrande acc 0.7159
openbookqa acc 0.356
acc_norm 0.448
Average (including HF leaderboard datasets) 0.6468

BigBenchHard results:

Task Metric Value
bigbench_causal_judgement multiple_choice_grade 0.6105
bigbench_date_understanding multiple_choice_grade 0.6423
bigbench_disambiguation_qa multiple_choice_grade 0.3643
bigbench_dyck_languages multiple_choice_grade 0.2000
bigbench_formal_fallacies_syllogisms_negation multiple_choice_grade 0.5002
bigbench_geometric_shapes multiple_choice_grade 0.0000
exact_str_match 0.0000
bigbench_hyperbaton multiple_choice_grade 0.6754
bigbench_logical_deduction_five_objects multiple_choice_grade 0.2700
bigbench_logical_deduction_seven_objects multiple_choice_grade 0.1929
bigbench_logical_deduction_three_objects multiple_choice_grade 0.4133
bigbench_movie_recommendation multiple_choice_grade 0.3000
bigbench_navigate multiple_choice_grade 0.5000
bigbench_reasoning_about_colored_objects multiple_choice_grade 0.5750
bigbench_ruin_names multiple_choice_grade 0.3281
bigbench_salient_translation_error_detection multiple_choice_grade 0.2976
bigbench_snarks multiple_choice_grade 0.6022
bigbench_sports_understanding multiple_choice_grade 0.5122
bigbench_temporal_sequences multiple_choice_grade 0.1450
bigbench_tracking_shuffled_objects_five_objects multiple_choice_grade 0.1976
bigbench_tracking_shuffled_objects_seven_objects multiple_choice_grade 0.1440
bigbench_tracking_shuffled_objects_three_objects multiple_choice_grade 0.4133
Average 0.3754

Ethical Considerations and Limitations

Tulpar is a technology with potential risks and limitations. This model is finetuned only in English and all language-related scenarios are not covered. As HyperbeeAI, we neither guarantee ethical, accurate, unbiased, objective responses nor endorse its outputs. Before deploying this model, you are advised to make safety tests for your use case.