Spaces:

kakumusic
/

gpt-eng

Configuration error

Upload folder using huggingface_hub

e7c3249 verified 2 months ago

No virus

1.51 kB

	# Test that the Open LLM is running

	First start the server by using only CPU:

	```bash
	export model_path="TheBloke/CodeLlama-13B-GGUF/codellama-13b.Q8_0.gguf"
	python -m llama_cpp.server --model $model_path
	```

	Or with GPU support (recommended):

	```bash
	python -m llama_cpp.server --model TheBloke/CodeLlama-13B-GGUF/codellama-13b.Q8_0.gguf --n_gpu_layers 1
	```

	If you have more `GPU` layers available set `--n_gpu_layers` to the higher number.

	To find the amount of available run the above command and look for `llm_load_tensors: offloaded 1/41 layers to GPU` in the output.

	## Test API call

	Set the environment variables:

	```bash
	export OPENAI_API_BASE="http://localhost:8000/v1"
	export OPENAI_API_KEY="sk-xxx"
	export MODEL_NAME="CodeLlama"
	````

	Then ping the model via `python` using `OpenAI` API:

	```bash
	python examples/open_llms/openai_api_interface.py
	```

	If you're not using `CodeLLama` make sure to change the `MODEL_NAME` parameter.

	Or using `curl`:

	```bash
	curl --request POST \
	--url http://localhost:8000/v1/chat/completions \
	--header "Content-Type: application/json" \
	--data '{ "model": "CodeLlama", "prompt": "Who are you?", "max_tokens": 60}'
	```

	If this works also make sure that `langchain` interface works since that's how `gpte` interacts with LLMs.

	## Langchain test

	```bash
	export MODEL_NAME="CodeLlama"
	python examples/open_llms/langchain_interface.py
	```

	That's it 🤓 time to go back [to](/docs/open_models.md#running-the-example) and give `gpte` a try.