Edit model card

EXL2 Quantization of Undi95's's MXLewd-L2-20B.

Model details

First attempt to quantize a 20B model so it can run on 16GB VRAM with the highest quality possible. Quantized at 3.18bpw with hb 6. 8.13bpw also available for those who want it (exl2 is very fast with flash-attention and the quality is (almost) the same with fp16.)

Perplexity:

Base = 6.4744

8bpw h8 = 6.4471

3.18 h6 = 6.5705

Dataset = wikitext

Prompt Format

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
Downloads last month
16
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including R136a1/MXLewd-L2-20B-exl2