File size: 5,998 Bytes
e7c3249
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
Using with open/local models
============================

**Use `gpte` first with OpenAI models to get a feel for the `gpte` tool.**

**Then go play with experimental Open LLMs πŸ‰ support and try not to get πŸ”₯!!**

At the moment the best option for coding is still the use of `gpt-4` models provided by OpenAI. But open models are catching up and are a good free and privacy-oriented alternative if you possess the proper hardware.

You can integrate `gpt-engineer` with open-source models by leveraging an OpenAI-compatible API.

We provide the minimal and cleanest solution below. What is described is not the only way to use open/local models, but the one we tested and would recommend to most users.

More details on why the solution below is recommended in [this blog post](https://zigabrencic.com/blog/2024-02-21).

Setup
-----

For inference engine we recommend for the users to use [llama.cpp](https://github.com/ggerganov/llama.cpp) with its `python` bindings `llama-cpp-python`.

We choose `llama.cpp` because:

- 1.) It supports the largest amount of hardware acceleration backends.
- 2.) It supports the diverse set of open LLMs.
- 3.) Is written in `python` and directly on top of `llama.cpp` inference engine.
- 4.) Supports the `openAI` API and `langchain` interface.

To install `llama-cpp-python` follow the official [installation docs](https://llama-cpp-python.readthedocs.io/en/latest/) and [those docs](https://llama-cpp-python.readthedocs.io/en/latest/install/macos/) for MacOS with Metal support.

If you want to benefit from proper hardware acceleration on your machine make sure to set up the proper compiler flags before installing your package.

- `linux`: `CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"`
- `macos` with Metal support: `CMAKE_ARGS="-DLLAMA_METAL=on"`
- `windows`: `$env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"`

This will enable the `pip` installer to compile the `llama.cpp` with the proper hardware acceleration backend.

Then run:

```bash
pip install llama-cpp-python
```

For our use case we also need to set up the web server that `llama-cpp-python` library provides. To install:

```bash
pip install 'llama-cpp-python[server]'
```

For detailed use consult the [`llama-cpp-python` docs](https://llama-cpp-python.readthedocs.io/en/latest/server/).

Before we proceed we need to obtain the model weights in the `gguf` format. That should be a single file on your disk.

In case you have weights in other formats check the `llama-cpp-python` docs for conversion to `gguf` format.

Models in other formats `ggml`, `.safetensors`, etc. won't work without prior conversion to `gguf` file format with the solution described below!

Which open model to use?
==================

Your best choice would be:

- CodeLlama 70B
- Mixtral 8x7B

We are still testing this part, but the larger the model you can run the better. Sure the responses might be slower in terms of (token/s), but code quality will be higher.

For testing that the open LLM `gpte` setup works we recommend starting with a smaller model. You can download weights of [CodeLlama-13B-GGUF by the `TheBloke`](https://huggingface.co./TheBloke/CodeLlama-13B-GGUF) choose the largest model version you can run (for example `Q6_K`), since quantisation will degrade LLM performance.

Feel free to try out larger models on your hardware and see what happens.

Running the Example
==================

To see that your setup works check [test open LLM setup](examples/test_open_llm/README.md).

If above tests work proceed πŸ˜‰

For checking that `gpte` works with the `CodeLLama` we recommend for you to create a project with `prompt` file content:

```
Write a python script that sums up two numbers. Provide only the `sum_two_numbers` function and nothing else.

Provide two tests:

assert(sum_two_numbers(100, 10) == 110)
assert(sum_two_numbers(10.1, 10) == 20.1)
```

Now run the LLM in separate terminal:

```bash
python -m llama_cpp.server --model $model_path --n_batch 256 --n_gpu_layers 30
```

Then in another terminal window set the following environment variables:

```bash
export OPENAI_API_BASE="http://localhost:8000/v1"
export OPENAI_API_KEY="sk-xxx"
export MODEL_NAME="CodeLLama"
export LOCAL_MODEL=true
```

And run `gpt-engineer` with the following command:

```bash
gpte <project_dir> $MODEL_NAME --lite --temperature 0.1
```

The `--lite` mode is needed for now since open models for some reason behave worse with too many instructions at the moment. Temperature is set to `0.1` to get consistent best possible results.

That's it.

*If sth. doesn't work as expected, or you figure out how to improve the open LLM support please let us know.*

Using Open Router models
==================

In case you don't posses the hardware to run local LLM's yourself you can use the hosting on [Open Router](https://openrouter.ai) and pay as you go for the tokens.

To set it up you need to Sign In and load purchase πŸ’° the LLM credits. Pricing per token is different for (each model](https://openrouter.ai/models), but mostly cheaper then Open AI.

Then create the API key.

To for example use [Meta: Llama 3 8B Instruct (extended)](https://openrouter.ai/models/meta-llama/llama-3-8b-instruct:extended) with `gpte` we need to set:

```bash
export OPENAI_API_BASE="https://openrouter.ai/api/v1"
export OPENAI_API_KEY="sk-key-from-open-router"
export MODEL_NAME="meta-llama/llama-3-8b-instruct:extended"
export LOCAL_MODEL=true
```

```bash
gpte <project_dir> $MODEL_NAME --lite --temperature 0.1
```

Using Azure models
==================

You set your Azure OpenAI key:
- `export OPENAI_API_KEY=[your api key]`

Then you call `gpt-engineer` with your service endpoint `--azure https://aoi-resource-name.openai.azure.com` and set your deployment name (which you created in the Azure AI Studio) as the model name (last `gpt-engineer` argument).

Example:
`gpt-engineer --azure https://myairesource.openai.azure.com ./projects/example/ my-gpt4-project-name`