--- base_model: unsloth/llama-3-8b-bnb-4bit language: - en license: apache-2.0 tags: - text-generation-inference - transformers - unsloth - llama - trl --- # Uploaded model - **Developed by:** tykiww - **License:** apache-2.0 - **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth) --------------------------------------------- # Setting up and testing own Endpoint Handler Sources: - https://www.philschmid.de/custom-inference-handler - https://discuss.huggingface.co/t/model-wont-load-on-custom-inference-endpoint/91780 - https://huggingface.co./docs/inference-endpoints/guides/custom_handler ### Setup Environment Install necessary packages to set up and test endpoint handler. ``` # install git-lfs to interact with the repository sudo apt-get update sudo apt-get install git-lfs # install transformers (not needed for inference since it is installed by default in the container) pip install transformers[sklearn,sentencepiece,audio,vision] ``` Clone model weights of interest. ``` git lfs install git clone https://huggingface.co./tykiww/llama3-8b-quantized ``` Login to huggingface ``` # setup cli with token huggingface-cli login git config --global credential.helper store ``` Confirm login in case you are unsure. ``` huggingface-cli whoami ``` Navigate to repo and create a handler.py file ``` cd llama3-8b-bnb-4bit-lora #&& touch handler.py ``` Create a requirements.txt file with the following items ``` huggingface_hub unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git xformers trl<0.9.0 peft==0.11.1 bitsandbytes transformers==4.41.2 # must use /: ``` Must have a GPU compatible with Unsloth.