--- license: llama2 language: - en pipeline_tag: text-generation inference: false tags: - facebook - meta - pytorch - llama - llama-2 - inferentia2 - neuron --- # Neuronx model for [codellama/CodeLlama-7b-hf](https://huggingface.co./codellama/CodeLlama-7b-hf) This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [codellama/CodeLlama-7b-hf](https://huggingface.co./codellama/CodeLlama-7b-hf). You can find detailed information about the base model on its [Model Card](https://huggingface.co./codellama/CodeLlama-7b-hf). This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below. It has been compiled to run on an inf2.8xlarge instance on AWS. Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co./docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters. ## Usage on Amazon SageMaker _coming soon_ ## Usage with 🤗 `optimum-neuron` ```python >>> from optimum.neuron import pipeline >>> p = pipeline('text-generation', 'jburtoft/CodeLlama-7b-hf-neuron-8xlarge') >>> p("import socket\n\ndef ping_exponential_backoff(host: str):", do_sample=True, top_k=10, temperature=0.1, top_p=0.95, num_return_sequences=1, max_length=200, ) ``` ``` [{'generated_text': 'import socket\n\ndef ping_exponential_backoff(host: str):\n """\n Ping a host with exponential backoff.\n\n :param host: Host to ping\n :return: True if host is reachable, False otherwise\n """\n for i in range(1, 10):\n try:\n socket.create_connection((host, 80), 1).close()\n return True\n except OSError:\n time.sleep(2 ** i)\n return False\n\n\ndef ping_exponential_backoff_with_timeout(host: str, timeout: int):\n """\n Ping a host with exponential backoff and timeout.\n\n :param host: Host to ping\n :param timeout: Timeout in seconds\n :return: True if host is reachable, False otherwise\n """\n for'}] ``` This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints. ## Arguments passed during export **input_shapes** ```json { "batch_size": 1, "sequence_length": 2048, } ``` **compiler_args** ```json { "auto_cast_type": "fp16", "num_cores": 2, } ```