machinez/zephyr-orpo-141b-A35b-v0.1-exl2
This model was converted to EXL2 format from HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
.
Refer to the original model card for more details on the model.
Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.
1.5 bits per weight - Fits Dual RTX 3090/4090 or Triple Nvidia Tesla P100 16gb at 4k context
2.75 bits per weight - Fits Quad Nvidia Tesla P100 16gb at 16k context
Sample instructions to load in TabbyAPI @ 1.5bpw on 3x Nvidia Tesla P100 16gb at 4k context. ~14 tok/s
{
"name": "machinez_zephyr-orpo-141b-A35b-v0.1_1.5bpw",
"max_seq_len": 4096,
"override_base_seq_len": 4096,
"gpu_split_auto": false,
"autosplit_reserve": [
96
],
"gpu_split": [
14.15,
14,
15
],
"rope_scale": 1,
"rope_alpha": 1,
"no_flash_attention": false,
"cache_mode": "fp16",
"prompt_template": "string",
"num_experts_per_token": 0,
"use_cfg": true,
"fasttensors": false,
"skip_queue": false
}
Sample instructions to load in TabbyAPI @ 2.75bpw on 4x Nvidia Tesla P100 16gb at 16k context. ~5.6 tok/s
{
"name": "machinez_zephyr-orpo-141b-A35b-v0.1_2.75bpw",
"max_seq_len": 16384,
"override_base_seq_len": 16384,
"gpu_split_auto": false,
"autosplit_reserve": [
96
],
"gpu_split": [
12.5,
13,
13,
16.1
],
"rope_scale": 1,
"rope_alpha": 1,
"no_flash_attention": false,
"cache_mode": "fp16",
"prompt_template": "string",
"num_experts_per_token": 0,
"use_cfg": true,
"fasttensors": false,
"skip_queue": false
}
Download instructions
With git:
git clone --single-branch --branch 2_75 https://huggingface.co./machinez/zephyr-orpo-141b-A35b-v0.1-exl2
With huggingface hub (credit to TheBloke for instructions, borrowed from bartowski):
pip3 install -U "huggingface_hub[cli]"
(optional)
git config --global credential.helper 'store --file ~/.my-credentials'
huggingface-cli login
To download the main
(only useful if you only care about measurement.json) branch to a folder called machinez_zephyr-orpo-141b-A35b-v0.1-exl2
:
mkdir machinez_zephyr-orpo-141b-A35b-v0.1-exl2_2.75bpw
huggingface-cli download machinez/zephyr-orpo-141b-A35b-v0.1-exl2 --local-dir machinez_zephyr-orpo-141b-A35b-v0.1-exl2 --local-dir-use-symlinks False
To download from a different branch, add the --revision
parameter:
mkdir machinez_zephyr-orpo-141b-A35b-v0.1-exl2_2.75bpw
huggingface-cli download machinez/zephyr-orpo-141b-A35b-v0.1-exl2 --revision 2_75 --local-dir machinez_zephyr-orpo-141b-A35b-v0.1-exl2_2.75bpw --local-dir-use-symlinks False
Model tree for machinez/zephyr-orpo-141b-A35b-v0.1-exl2
Base model
mistral-community/Mixtral-8x22B-v0.1