MARS-v0.2 / README.md

Update README.md

5df1613 verified 3 months ago

4.61 kB

	---
	license: llama3
	language:
	- tr
	- en
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	model-index:
	- name: MARS
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge TR v0.2
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc
	value: 43.85
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag TR
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc
	value: 46.64
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA TR v0.2
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: acc
	name: accuracy
	value: 48.66
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande TR v0.2
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 52.84
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k TR v0.2
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 59.30
	name: accuracy
	pipeline_tag: text-generation
	---


	<img src="MARS-2.0.png" alt="Curiosity MARS model logo" style="border-radius: 1rem; width: 100%">


	<div style="display: flex; justify-content: center; align-items: center; flex-direction: column">
	<h1 style="font-size: 5em; margin-bottom: 0; padding-bottom: 0;">MARS-v0.2</h1>
	<aside>by <a href="https://curiosity.tech">Curiosity Technology</a></aside>
	</div>

	MARS-v0.2 is the second iteration of Curiosity Technology models, built on the foundation of Llama 3.1 8B. This version expands upon the initial MARS model by fine-tuning it with a more comprehensive dataset, with an increased emphasis on mathematical data to enhance its reasoning and problem-solving capabilities.

	We've continued our commitment to Turkish language processing, utilizing both in-house Turkish datasets and a broader selection of translated open-source datasets. We believe this version will serve the community with even more versatility and depth.

	MARS have been trained for 3 days on 4xA100.

	## Model Details

	- Base Model: Meta Llama 3.1 8B Instruct
	- Training Dataset: In-house & Translated Open Source Turkish Datasets
	- Training Method: LoRA Fine Tuning


	## How to use

	You can run conversational inference using the Transformers pipeline abstraction, or by leveraging the Auto classes with the `generate()` function. Let's see examples of both.

	### Transformers pipeline

	```python
	import transformers
	import torch

	model_id = "curiositytech/MARS-v0.2"

	pipeline = transformers.pipeline(
	"text-generation",
	model=model_id,
	model_kwargs={"torch_dtype": torch.bfloat16},
	device_map="auto",
	)

	messages = [
	{"role": "system", "content": "Sen korsan gibi konuşan bir korsan chatbotsun!"},
	{"role": "user", "content": "Sen kimsin?"},
	]

	terminators = [
	pipeline.tokenizer.eos_token_id,
	pipeline.tokenizer.convert_tokens_to_ids("<\|eot_id\|>")
	]

	outputs = pipeline(
	messages,
	max_new_tokens=256,
	eos_token_id=terminators,
	do_sample=True,
	temperature=0.6,
	top_p=0.9,
	)
	print(outputs[0]["generated_text"][-1])
	```

	### Transformers AutoModelForCausalLM

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = "curiositytech/MARS-v0.2"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	messages = [
	{"role": "system", "content": "Sen korsan gibi konuşan bir korsan chatbotsun!"},
	{"role": "user", "content": "Sen kimsin?"},
	]

	input_ids = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	terminators = [
	tokenizer.eos_token_id,
	tokenizer.convert_tokens_to_ids("<\|eot_id\|>")
	]

	outputs = model.generate(
	input_ids,
	max_new_tokens=256,
	eos_token_id=terminators,
	do_sample=True,
	temperature=0.6,
	top_p=0.9,
	)
	response = outputs[0][input_ids.shape[-1]:]
	print(tokenizer.decode(response, skip_special_tokens=True))
	```