BrainGPT
/

BrainGPT-7B-v0.2

text-generation-inference

Model card Files Files and versions Community

BrainGPT-7B-v0.2 / README.md

Kenkentron's picture

Update explanations to be consistent with paper

c0f531e verified 2 months ago

|

2.16 kB

	---
	license: apache-2.0
	datasets:
	- BrainGPT/train_valid_split_pmc_neuroscience_2002-2022_filtered_subset
	tags:
	- text-generation-inference
	- peft
	---
	## About this model:
	The model was developed and evaluated as a part of this research paper https://www.nature.com/articles/s41562-024-02046-9 (Fig. 5)


	## Training details:
	We fine-tuned Mistral-7B-v0.1 using LoRA. We used a batch size of 1 and a chunk size of 2048. Training involved the use of the AdamW optimizer with a learning rate of 2e-5 and gradient accumulation steps set at 8. A single training epoch was performed, along with a warm-up step of 0.03 and a weight decay rate of 0.001. The learning rate was controlled using a cosine learning rate scheduler. LoRA adapters, characterized by a rank of 256, an alpha value of 512, and a dropout rate of 0.1, were applied after all self-attention blocks and fully-connected layers. This results in total 629,145,600 trainable parameters, roughly 8% of the entire parameters of the base model. To optimize training performance, bf16 mixed precision training and data parallelism were employed. We used 4 Nvidia A100 (80GB) GPUs hosted on the Microsoft Azure platform. An epoch of training takes roughly 65 graphics processing unit hours.

	## Training data:
	Please refer to Dataset card: https://huggingface.co./datasets/BrainGPT/train_valid_split_pmc_neuroscience_2002-2022_filtered_subset

	## Model weights:
	The current version of BrainGPT was fine-tuned on Mistral-7B-v0.1 with LoRA, `adapter_model.bin` contains the LoRA adapter weights. To load and use the full model, you need to be granted access to Mistral-7B-v0.1 via https://huggingface.co./mistralai/Mistral-7B-v0.1.

	## Load and use model:
	```python
	from peft import PeftModel, PeftConfig
	from transformers import AutoModelForCausalLM
	from transformers import AutoTokenizer

	config = PeftConfig.from_pretrained("BrainGPT/BrainGPT-7B-v0.2")

	# Load model
	model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
	model = PeftModel.from_pretrained(model, "BrainGPT/BrainGPT-7B-v0.2")

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

	```