This repository houses a fork of togethercomputer/LLaMA-2-7B-32K
's modeling_flash_llama.py
, with a fix for padding of attention weights merged into it.
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.