metadata

language:
  - en
  - de
  - fr
  - it
  - pt
  - hi
license: llama3.1
library_name: transformers
pipeline_tag: text-classification
tags:
  - facebook
  - meta
  - pytorch
  - llama
  - brand-safety
  - classification
model-index:
  - name: vision-1-mini
    results:
      - task:
          type: text-classification
          name: Brand Safety Classification
        metrics:
          - type: accuracy
            value: 0.95
            name: Classification Accuracy
datasets:
  - BrandSafe-16k
metrics:
  - accuracy
base_model: meta-llama/Llama-2-8b-chat
model_size: 4.58 GiB
parameters: 8.03B
quantization: GGUF V3
architectures:
  - LlamaForCausalLM
model_parameters:
  block_count: 32
  context_length: 131072
  embedding_length: 4096
  feed_forward_length: 14336
  attention_heads: 32
  kv_heads: 8
  rope_freq_base: 500000
  vocab_size: 128256
hardware:
  recommended: Apple Silicon
  memory:
    cpu_kv_cache: 992.00 MiB
    metal_kv_cache: 32.00 MiB
    metal_compute: 560.00 MiB
    cpu_compute: 560.01 MiB
inference:
  load_time: 3.27s
  device: Metal (Apple M3 Pro)
  memory_footprint:
    cpu: 4552.80 MiB
    metal: 132.50 MiB

vision-1-mini

Vision-1-mini is an optimized 8B parameter model based on Llama 3.1, specifically designed for brand safety classification. This model is particularly optimized for Apple Silicon devices and provides efficient, accurate brand safety assessments using the BrandSafe-16k classification system.

Model Details

Model Type: Brand Safety Classifier
Base Model: Meta Llama 3.1 8B Instruct
Parameters: 8.03 billion
Architecture: Llama
Quantization: Q4_K
Size: 4.58 GiB (4.89 BPW)
License: Llama 3.1

Performance Metrics

Load Time: 3.27 seconds (on Apple M3 Pro)
Memory Usage:
- CPU Buffer: 4552.80 MiB
- Metal Buffer: 132.50 MiB
- KV Cache: 1024.00 MiB (512.00 MiB K, 512.00 MiB V)
- Compute Buffer: 560.00 MiB

Hardware Compatibility

Apple Silicon Optimizations

Optimized for Metal/MPS
Unified Memory Architecture support
SIMD group reduction and matrix multiplication optimizations
Efficient layer offloading (1/33 layers to GPU)

System Requirements

Recommended Memory: 12GB+
GPU: Apple Silicon preferred (M1/M2/M3 series)
Storage: 5GB free space

Classification Categories

The model classifies content into the following categories:

B1-PROFANITY - Contains profane or vulgar language
B2-OFFENSIVE_SLANG - Contains offensive slang or derogatory terms
B3-COMPETITOR - Mentions or promotes competing brands
B4-BRAND_CRITICISM - Contains criticism or negative feedback about brands
B5-MISLEADING - Contains misleading or deceptive information
B6-POLITICAL - Contains political content or bias
B7-RELIGIOUS - Contains religious content or references
B8-CONTROVERSIAL - Contains controversial topics or discussions
B9-ADULT - Contains adult or mature content
B10-VIOLENCE - Contains violent content or references
B11-SUBSTANCE - Contains references to drugs, alcohol, or substances
B12-HATE - Contains hate speech or discriminatory content
B13-STEREOTYPE - Contains stereotypical representations
B14-BIAS - Shows bias against groups or individuals
B15-UNPROFESSIONAL - Contains unprofessional content or behavior
B16-MANIPULATION - Contains manipulative content or tactics
SAFE - Contains no brand safety concerns

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained("maxsonderby/vision-1-mini", 
                                           device_map="auto",
                                           torch_dtype=torch.float16,
                                           low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained("maxsonderby/vision-1-mini")

# Example usage
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, 
                        max_new_tokens=1,
                        temperature=0.1,
                        top_p=0.9)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)

Model Architecture

Attention Mechanism:
- Head Count: 32
- KV Head Count: 8
- Layer Count: 32
- Embedding Length: 4096
- Feed Forward Length: 14336
- Context Length: 2048 (optimized from 131072)
- RoPE Base Frequency: 500000
- Dimension Count: 128

Training & Fine-tuning

This model is fine-tuned on brand safety classification tasks using the BrandSafe-16k dataset. The model uses an optimized context window of 2048 tokens and is configured for precise, deterministic outputs with:

Temperature: 0.1
Top-p: 0.9
Batch Size: 512
Thread Count: 8

Limitations

The model is optimized for shorter content classification (up to 2048 tokens)
Performance may vary on non-Apple Silicon hardware
The model focuses solely on brand safety classification and may not be suitable for other tasks
Classification accuracy may vary based on content complexity and context

Citation

If you use this model in your research, please cite:

@misc{vision-1-mini,
  author = {Max Sonderby},
  title = {Vision-1-Mini: Optimized Brand Safety Classification Model},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co./maxsonderby/vision-1-mini}}
}