Adding the Open Portuguese LLM Leaderboard Evaluation Results

d8006d0 verified 20 days ago

14.3 kB

	---
	language: pt
	license: mit
	base_model: mistralai/Mistral-7B-Instruct-v0.1
	pipeline_tag: text-generation
	model-index:
	- name: bluearara-7B-instruct-v02
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: ENEM Challenge (No Images)
	type: eduagarcia/enem_challenge
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 28.48
	name: accuracy
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=fernandosola/bluearara-7B-instruct-v02
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BLUEX (No Images)
	type: eduagarcia-temp/BLUEX_without_images
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 21.84
	name: accuracy
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=fernandosola/bluearara-7B-instruct-v02
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: OAB Exams
	type: eduagarcia/oab_exams
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 27.84
	name: accuracy
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=fernandosola/bluearara-7B-instruct-v02
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 RTE
	type: assin2
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 56.04
	name: f1-macro
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=fernandosola/bluearara-7B-instruct-v02
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 STS
	type: eduagarcia/portuguese_benchmark
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: pearson
	value: 21.3
	name: pearson
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=fernandosola/bluearara-7B-instruct-v02
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: FaQuAD NLI
	type: ruanchaves/faquad-nli
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 45.33
	name: f1-macro
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=fernandosola/bluearara-7B-instruct-v02
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HateBR Binary
	type: ruanchaves/hatebr
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 84.14
	name: f1-macro
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=fernandosola/bluearara-7B-instruct-v02
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: PT Hate Speech Binary
	type: hate_speech_portuguese
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 52.17
	name: f1-macro
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=fernandosola/bluearara-7B-instruct-v02
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: tweetSentBR
	type: eduagarcia/tweetsentbr_fewshot
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 66.91
	name: f1-macro
	source:
	url: https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard?query=fernandosola/bluearara-7B-instruct-v02
	name: Open Portuguese LLM Leaderboard
	---


	# Blue Arara 7b instruct GGUF
	- Criador do modelo: [fernandosola](https://huggingface.co./fernandosola).
	- Modelo base: [bluearara-7b-instruct](https://huggingface.co./fernandosola/bluearara-7B-instruct).
	- Modelo fundacional: [Mistral 7B Instruct v0.2](https://huggingface.co./mistralai/Mistral-7B-Instruct-v0.2).

	<!-- description start -->
	## Descrição

	Este repositório contêm o modelo quantizado GGUF de [bluearara-7b-instruct] (https://huggingface.co./fernandosola/bluearara-7B-instruct). <br>
	Estes arquivos foram quantizados por AByrth

	<div style="width: auto; margin-left: auto; margin-right: auto">
	<img src="https://i.imgur.com/ZiM1Ua0.png" alt="Arara Azul" style="width:256; min-width: 100px; margin: auto;">
	</div>



	<!-- description end -->

	<!-- README_GGUF.md-about-gguf start -->

	### Sobre o formato GGUF

	GGUF é um formato quantizado para modelos LLM introduzido pela equipe llama.cpp em Agosto de 2023.

	Sugestões de aplicativos clientes e bibliotecas que suportam modelos GGUF:

	* [llama.cpp](https://github.com/ggerganov/llama.cpp). O projeto fonte do GGUF. Oferece uma CLI e uma opção de servidor.
	* [text-generation-webui](https://github.com/oobabooga/text-generation-webui), uma interface web com muitos recursos e extensões poderosas. Suporta aceleração de GPU.
	* [GPT4All](https://gpt4all.io/index.html), uma GUI de execução local gratuita e de código aberto, compatível com Windows, Linux e macOS com aceleração em GPU.
	* [LM Studio](https://lmstudio.ai/), um aplicativo GUI local fácil de usar para Windows, Linux e macOS (Silicon), com aceleração de GPU.
	* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), uma biblioteca Python com aceleração de GPU, suporte LangChain e servidor API compatível com OpenAI.

	<!-- README_GGUF.md-about-gguf end -->
	<!-- repositories-available start -->
	## Repositorios disponíveis

	* [Modelo GGUF 3, 4, 5, 6 e 8-bit para inferências CPU ou CPU+GPU](https://huggingface.co./fernandosola/bluearara-7B-instruct-GGUF)
	* [Modelo GPTQ para inferências GPU, quantizado em 4 bits.](https://huggingface.co./fernandosola/bluearara-7B-instruct-GPTQ)
	* [Modelo original não quantizado em formato pytorch fp16, para inferência GPU e futuras](https://huggingface.co./fernandosola/bluearara-7B)
	<!-- repositories-available end -->

	<!-- prompt-template start -->
	## Prompt template: Mistral

	```
	<s>[INST] {prompt} [/INST]
	```

	<!-- prompt-template end -->

	<!-- README_GGUF.md-provided-files start -->
	## Modelos Quantizados GGUF
	\| Nome \| Quantização \| Bits \| Tamanho \| Consumo de RAM \| Indicação \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ----- \|
	\| [bluearara-7B-instruct_Q3_K_M.gguf](https://huggingface.co./fernandosola/bluearara-7B-instruct-GGUF/blob/main/bluearara-7B-instruct_Q3_K_M.gguf) \| Q3_K_M \| 3 \| 3.52 GB\| 6.02 GB \| Pequeno, perda de qualidade razoável\|
	\| [bluearara-7B-instruct-Q4_0.gguf](https://huggingface.co./fernandosola/bluearara-7B-instruct-GGUF/blob/main/bluearara-7B-instruct_Q4_0.gguf) \| Q4_0 \| 4 \| 4.11 GB\| 6.61 GB \| Pequeno, Qualidade balanceada\|
	\| [bluearara-7B-instruct-Q5_0.gguf](https://huggingface.co./fernandosola/bluearara-7B-instruct-GGUF/blob/main/bluearara-7B-instruct_Q5_0.gguf) \| Q5_0 \| 5 \| 5.00 GB\| 7.50 GB \| Médio, Qualidade balanceada\|
	\| [bluearara-7B-instruct-Q6_K.gguf](https://huggingface.co./fernandosola/bluearara-7B-instruct-GGUF/blob/main/bluearara-7B-instruct_Q6_K.gguf) \| Q6_K \| 6 \| 5.94 GB\| 8.44 GB \| Grande, pouquíssima perda de qualidade\|
	\| [bluearara-7B-instruct-Q8_0.gguf](https://huggingface.co./fernandosola/bluearara-7B-instruct-GGUF/blob/main/bluearara-7B-instruct_Q8_0.gguf) \| Q8_0 \| 8 \| 7.70 GB\| 10.20 GB \| Grande, perda mínima de qualidade\|

	Note: Note: O consumo de RAM assume uso exclusivo em CPU, sem offload de camadas (layers) do modelo para a GPU. O consumo de RAM é reduzido se uma ou mais camadas forem descarregadas em GPU.

	<!-- README_GGUF.md-provided-files end -->

	<!-- README_GGUF.md-how-to-download start -->
	## Como realizar o download dos arquivos GGUF

	Observação para downloaders manuais: Evite clonar o repositório inteiro! Escolha apenas o arquivo com a quantização desejada.

	Os clientes/bibliotecas listados abaixo farão download automaticamente para você, fornecendo uma lista de modelos disponíveis para você escolher:

	* LM Studio
	* oobabooga/text-generation-webui

	### Usando oobabooga/text-generation-webui

	Em Download Model, coloque o endereço do repositorio e modelo: fernandosola/bluearara-7B-instruct-v02

	Finalmente, clique "Download"

	### Em comando de linha

	Use a biblioteca Python `huggingface-hub`:

	```shell
	pip3 install huggingface-hub
	```

	Então você pode baixar qualquer arquivo de modelo individual para o diretório atual, em alta velocidade, com um comando como este:

	```shell
	huggingface-cli download fernandosola/bluearara-7B-instruct-GGUF bluearara-7B-instruct_q4_0.gguf.gguf --local-dir . --local-dir-use-symlinks False
	```

	<details>
	<summary>Opções avançados de download do huggingface-cli (clique para ler)</summary>

	Você também pode baixar vários arquivos de uma vez com um padrão:

	```shell
	huggingface-cli download fernandosola/bluearara-7B-instruct-GGUF --local-dir . --local-dir-use-symlinks False --include='Q4gguf'
	```

	Para acelerar downloads em conexões rápidas (1Gbit/s ou superior), instale `hf_transfer`:

	```shell
	pip3 install hf_transfer
	```

	E defina a variável de ambiente `HF_HUB_ENABLE_HF_TRANSFER` com valor `1`:

	```shell
	HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download fernandosola/bluearara-7B-instruct-GGUF bluearara-7B-instruct_Q4_0.gguf --local-dir . --local-dir-use-symlinks False
	```

	Usuários da linha de comando do Windows: você pode definir a variável de ambiente executando`set HF_HUB_ENABLE_HF_TRANSFER=1` antes de executar o comando para download.
	</details>
	<!-- README_GGUF.md-how-to-download end -->
	<!-- README_GGUF.md-how-to-run start -->

	## Como executar com `llama.cpp`

	Baixe a versão mais recente do `llama.cpp`:
	```shell
	git clone https://github.com/ggerganov/llama.cpp.git
	```

	Compile os aplicativos C++ nativos usando `make`:
	```shell
	make
	```

	Execute o modelo quantizado baixado previamente:


	```shell
	./main -ngl 35 -m bluearara-7B-instruct_Q4_0.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<s>[INST] {prompt} [/INST]"
	```

	Mude `-ngl 32` para o número de camadas (layerd) que deseja descarregar para a GPU. Remova, caso não tenha GPU disponível.

	Mude `-c 2048` para o tamanho de contexto desejado. O valor máximo é 32768 (32K). No entanto, contextos longos exigem muita mémória RAM. Reduza se enfrentar problemas.

	Se você quiser ter uma conversa no estilo chat, substitua o argumento `-p <PROMPT>` por `-i -ins`

	para outros parâmetros, consulte [the llama.cpp documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md)

	## Usando `text-generation-webui`

	Mais instruções podem ser encontradas na documentação do text-generation-webui, aqui: [text-generation-webui/docs/04 ‐ Model Tab.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/04%20%E2%80%90%20Model%20Tab.md#llamacpp).

	## Executando com Python

	Você pode usar modelos GGUF do Python usando a biblioteca [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)

	### Exemplo:

	#### Simple llama-cpp-python example code

	```python
	from llama_cpp import Llama

	# Defina gpu_layers como o número de camadas a serem descarregadas para GPU. Defina como 0 se nenhuma aceleração de GPU estiver disponível em seu sistema.
	llm = Llama(
	model_path="./bluearara-7B-instruct_Q4_0.gguf", # Faça download do modelo antes
	n_ctx=32768, # Tamanho máximo de contexto. Reduza, se tiver pouco RAM disponível
	n_threads=8, # Número de threads CPU para usar. Teste para verificar o número idela no seu sistema.
	n_gpu_layers=35 # O número de camadas (layers) a serem descarregadas para GPU, se você tiver aceleração de GPU disponível
	)

	# Fazendo uma inferência simples
	output = llm(
	"<s>[INST] {prompt} [/INST]", # Prompt
	max_tokens=512, # Limitando resposta em 512 tokens
	stop=["</s>"], # Examplo de stop token - verifique no seu modelo
	echo=True # True se deve repetir o prompt.


	# API para Chat Completion
	llm = Llama(model_path="./bluearara-7B-instruct_Q4_0.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
	llm.create_chat_completion(
	messages = [
	# o role "system" está em desuso.
	# {"role": "system", "content": "You are a story writing assistant."},
	{
	"role": "user", "content": "You are a story writing assistant. \n\n Write a story about llamas."
	}
	]
	)
	```

	## Carbon Footprint - Environmental Concerns
	A pegada de Carbono deste subprojeto consumiu 6 de horas de computação acumuladas em GPUs em hardware do tipo NVIDIA RTX 3070 e CPU Intel Xeon E5. Toda energia consumida foi de origem solar, com produção de 0.0g de CO2 ou gases de efeitos estufa.

	The Carbon footprint of this subproject consumed 6 hours of computing accumulated on hardware as NVIDIA RTX 3070 GPUs (100W-270W) and Intel Xeon E5 CPU (100W). All energy consumed was of solar origin, producing 0.0g of CO2 or greenhouse gases.


	# Open Portuguese LLM Leaderboard Evaluation Results

	Detailed results can be found [here](https://huggingface.co./datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/fernandosola/bluearara-7B-instruct-v02) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co./spaces/eduagarcia/open_pt_llm_leaderboard)

	\| Metric \| Value \|
	\|--------------------------\|---------\|
	\|Average \|44.89\|
	\|ENEM Challenge (No Images)\| 28.48\|
	\|BLUEX (No Images) \| 21.84\|
	\|OAB Exams \| 27.84\|
	\|Assin2 RTE \| 56.04\|
	\|Assin2 STS \| 21.30\|
	\|FaQuAD NLI \| 45.33\|
	\|HateBR Binary \| 84.14\|
	\|PT Hate Speech Binary \| 52.17\|
	\|tweetSentBR \| 66.91\|