marclove commited on
Commit
2b9363a
1 Parent(s): 0664c7e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md CHANGED
@@ -1,3 +1,87 @@
1
  ---
2
  license: llama2
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama2
3
+ datasets:
4
+ - marclove/llama_functions
5
+ - timdettmers/openassistant-guanaco
6
+ language:
7
+ - en
8
+ library_name: transformers
9
+ pipeline_tag: conversational
10
  ---
11
+ # Model Card for Llama-2 7B Chat Functions
12
+
13
+ ‼️ This model is still in a beta state. It will be retrained at a future data and updated, during which its prompting format may change. If you need to depend on it in its current state, please create your own fork and provide attribution to this original repository. ‼️
14
+
15
+ Llama Functions is a further fine-tuned version of [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), using (1) a 50/50 mix of synthetic OpenAPI function calls and (2) chat completions from the [Guanaco subset of the OASST1 dataset](https://huggingface.co/datasets/timdettmers/openassistant-guanaco). 13B & 70B versions are coming soon.
16
+
17
+ The function calling dataset is mixed with Guanaco in order to maintain accuracy and helpfulness when calling a function is not the appropriate response. Guidelines for use, more detailed information regarding limitations, and eval stats of 7B, 13B, and 70B models.
18
+
19
+ There is no existing evaluation benchmark to measure the accuracy of function calls, which makes it hard during training to identify when we've maximized the balance of function calling accuracy and chat model performance. I'm working on a custom HF eval for this purpose, but until then I have chosen to mix the two datasets in equal parts to get a proxy of performance for both tasks in the eval & test stats during fine-tuning. The current checkpoint is at 1000 steps, when eval & test loss reached their lowest point.
20
+
21
+ - **Developed by:** Marc Love
22
+ - **License:** [Creative Commons' Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license](https://creativecommons.org/licenses/by-sa/4.0/)
23
+ - **Finetuned from:** [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
24
+
25
+ ### Model Sources [optional]
26
+
27
+ <!-- Provide the basic links for the model. -->
28
+
29
+ - **Repository:** Coming soon
30
+ - **Demo:** [llama2-7b-chat-functions](https://huggingface.co/spaces/marclove/llama2-7b-chat-functions)
31
+
32
+ ## Uses
33
+
34
+ Please note that the synthetic data portion of the dataset was generated using OpenAI models, which may or may not impact your ability to use the dataset, depending on your use case.
35
+
36
+ ## Bias, Risks, and Limitations
37
+
38
+ No additional bias beyond that of the underlying model [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and those introduced by the [Guanaco subset of the OASST1 dataset](https://huggingface.co/datasets/timdettmers/openassistant-guanaco).
39
+
40
+ This model can hallucinate function calls that do not exist in the system prompt. While I hope to improve this by iterating on the `llama_functions` dataset, the 7B model will likely continue to struggle with this. I'm hoping to see more accuracy and less hallucination in larger models and plan to experiment with inference strategies, such as [grammar-based sampling](https://github.com/ggerganov/llama.cpp/pull/1773) and classifier-based routing, to improve performance in smaller models.
41
+
42
+ At the very minimum, I encourage you to validate outputs before attempting to use responses to call any functions. For example, several people have found Pydantic to be a convenient way to both describe functions and validate calls prior to execution.
43
+
44
+
45
+ ## Training Details
46
+
47
+ ### Training Data
48
+
49
+ See the [`llama_functions` dataset](https://huggingface.co/datasets/marclove/llama_functions) for more information.
50
+
51
+ ### Training Procedure
52
+
53
+ Coming soon
54
+
55
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
56
+
57
+
58
+ #### Training Hyperparameters
59
+
60
+ Coming soon
61
+
62
+ <!--- **Training regime:** [More Information Needed] <-- fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
63
+
64
+ #### Sizes
65
+
66
+ 11B & 70B chat and non-chat versions coming soon
67
+
68
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
69
+
70
+ ## Evaluation
71
+
72
+ Coming soon
73
+ <!-- This section describes the evaluation protocols and provides the results. -->
74
+
75
+
76
+ ## Citation
77
+
78
+ ```
79
+ @misc{LlamaFunctions,
80
+ title = {LlamaFunctions: An Open Dataset of Structured API Calls From Natural Language Prompts},
81
+ author = {Marc Love},
82
+ year = {2023},
83
+ publisher = {HuggingFace},
84
+ journal = {HuggingFace repository},
85
+ howpublished = {\url{https://https://huggingface.co/marclove/llama_functions},
86
+ }
87
+ ```