Athene-V2-Chat-72B: Rivaling GPT-4o across Benchmarks

AWQ 4bit version of Nexusflow/Athene-V2-Chat
Quantization code

Eval AWQ version

Evaluation results on ZebraLogic

│              Model               │  Mode  │  N_Mode  │  N_Size  │  Puzzle Acc  │  Easy Puzzle Acc  │  Hard Puzzle Acc  │  Cell Acc  │  No answer  │  Total Puzzles  │  Reason Lens  │
│      o1-preview-2024-09-12       │ greedy │  single  │    1     │     71.4     │       98.57       │       60.83       │   75.14    │     0.3     │      1000       │    1565.88    │
│    claude-3-5-sonnet-20241022    │ greedy │  single  │    1     │     36.2     │       91.07       │       14.86       │   54.27    │      0      │      1000       │    861.18     │
│ Llama-3.1-405B-Inst-fp8@together │ greedy │  single  │    1     │     32.6     │       87.14       │       11.39       │    45.8    │    12.5     │      1000       │    314.66     │
│        Athene-V2-Chat-AWQ        │ greedy │  single  │    1     │     27.8     │       77.14       │       8.61        │   45.83    │     6.4     │      1000       │    1785.7     │
│       Qwen2.5-72B-Instruct       │ greedy │  single  │    1     │     26.6     │       76.43       │       7.22        │   40.92    │    11.9     │      1000       │    1795.9     │
│       Qwen2.5-32B-Instruct       │ greedy │  single  │    1     │     26.1     │       77.5        │       6.11        │   43.39    │     6.3     │      1000       │    1333.07    │
│            Athene-70B            │ greedy │  single  │    1     │     16.7     │       52.5        │       2.78        │   32.98    │    21.1     │      1000       │    391.19     │

radm
/

Athene-V2-Chat-AWQ

Athene-V2-Chat-72B: Rivaling GPT-4o across Benchmarks

Eval AWQ version

Model tree for radm/Athene-V2-Chat-AWQ