TheBloke
/

NexusRaven-V2-13B-GGUF

Transformers

GGUF

llama

Model card Files Files and versions Community

TheBloke commited on Dec 10, 2023

Commit

3cc4ec2

1 Parent(s): 7c00623

Upload README.md

Browse files

Files changed (1) hide show

README.md +13 -6

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 base_model: Nexusflow/NexusRaven-V2-13B
 inference: false
-license: llama2
 model-index:
 - name: NexusRaven-13B
   results: []
@@ -98,8 +98,15 @@ User Query: {prompt}<human_end>
 ```
 <!-- prompt-template end -->
 <!-- compatibility_gguf start -->
 ## Compatibility
@@ -212,12 +219,12 @@ Windows Command Line users: You can set the environment variable by running `set
 Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
 ```shell
-./main -ngl 35 -m nexusraven-v2-13b.Q4_K_M.gguf --color -c 16384 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Function:\ndef function_here(arg1):\n  """\n    Comments explaining the function here\n\n    Args:\n    list args\n\n    Returns:\n    list returns\n    """\n\nFunction:\ndef another_function_here(arg1):\n  ...\n\nUser Query: {prompt}<human_end>"
 ```
 Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
-Change `-c 16384` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically. Note that longer sequence lengths require much more resources, so you may need to reduce this value.
 If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
@@ -266,7 +273,7 @@ from llama_cpp import Llama
 # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
 llm = Llama(
   model_path="./nexusraven-v2-13b.Q4_K_M.gguf",  # Download the model file first
-  n_ctx=16384,  # The max sequence length to use - note that longer sequence lengths require much more resources
   n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
   n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available
 )
@@ -377,7 +384,7 @@ NexusRaven-V2 is capable of generating deeply nested function calls, parallel fu
 ### Quick Start Prompting Guide
-Please refer to our notebook, [How-To-Prompt.ipynb](How-To-Prompt.ipynb), for more advanced tutorials on using NexusRaven-V2!
 1. We strongly recommend to set sampling to False when prompting NexusRaven-V2.
 2. We strongly recommend a very low temperature (~0.001).
@@ -468,7 +475,7 @@ For a deeper dive into the results, please see our [Github README](https://githu
 3. The explanations generated by NexusRaven-V2 might be incorrect. Please ensure proper guardrails are present to capture errant behavior.
 ## License
-This model was trained on commercially viable data and is licensed under the [Llama 2 community license](https://huggingface.co/codellama/CodeLlama-13b-hf/blob/main/LICENSE) following the original [CodeLlama-13b-hf](https://huggingface.co/codellama/CodeLlama-13b-hf/) model.
 ## References

 ---
 base_model: Nexusflow/NexusRaven-V2-13B
 inference: false
+license: other
 model-index:
 - name: NexusRaven-13B
   results: []
 ```
 <!-- prompt-template end -->
+<!-- licensing start -->
+## Licensing
+The creator of the source model has listed its license as `other`, and this quantization has therefore used that same license.
+As this model is based on Llama 2, it is also subject to the Meta Llama 2 license terms, and the license files for that are additionally included. It should therefore be considered as being claimed to be licensed under both licenses. I contacted Hugging Face for clarification on dual licensing but they do not yet have an official position. Should this change, or should Meta provide any feedback on this situation, I will update this section accordingly.
+In the meantime, any questions regarding licensing, and in particular how these two licenses might interact, should be directed to the original model repository: [Nexusflow's NexusRaven V2 13B](https://huggingface.co/Nexusflow/NexusRaven-V2-13B).
+<!-- licensing end -->
 <!-- compatibility_gguf start -->
 ## Compatibility
 Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
 ```shell
+./main -ngl 35 -m nexusraven-v2-13b.Q4_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Function:\ndef function_here(arg1):\n  """\n    Comments explaining the function here\n\n    Args:\n    list args\n\n    Returns:\n    list returns\n    """\n\nFunction:\ndef another_function_here(arg1):\n  ...\n\nUser Query: {prompt}<human_end>"
 ```
 Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
+Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically. Note that longer sequence lengths require much more resources, so you may need to reduce this value.
 If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
 # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
 llm = Llama(
   model_path="./nexusraven-v2-13b.Q4_K_M.gguf",  # Download the model file first
+  n_ctx=2048,  # The max sequence length to use - note that longer sequence lengths require much more resources
   n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
   n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available
 )
 ### Quick Start Prompting Guide
+Please refer to our notebook, [How-To-Prompt.ipynb](https://colab.research.google.com/drive/19JYixRPPlanmW5q49WYi_tU8rhHeCEKW?usp=sharing), for more advanced tutorials on using NexusRaven-V2!
 1. We strongly recommend to set sampling to False when prompting NexusRaven-V2.
 2. We strongly recommend a very low temperature (~0.001).
 3. The explanations generated by NexusRaven-V2 might be incorrect. Please ensure proper guardrails are present to capture errant behavior.
 ## License
+This model was trained on commercially viable data and is licensed under the [Nexusflow community license](https://huggingface.co/Nexusflow/NexusRaven-V2-13B/blob/main/LICENSE.txt).
 ## References