README.md · grimulkan/aurelian-alpha0.1-70b-rope8-32K-6bpw_h8_exl2 at e0a34814eae37dad88ad51c6574f7618297bdb78

metadata

license: unknown

This is an alpha release for testing & feedback, there are known issues. (see known issues below). I am already training the next version, but due to the long training times, I'd appreciate any feedback in the interim period.

Details

Base model: LongLORA 70B (no base instruction tuning)
Fine-tuned with Llama-2 chat format
System prompt: An interaction between a user providing instructions, and an imaginative assistant providing responses.
32K context length, use Linear Rope Scaling = 8 (IMPORTANT: use a factor of 8 even if you are not using the full 32K context length)
This model is not censored, and is capable of producing offensive and NSFW content. Please use this model with caution, and do not use if you are offended by such content.
License: Unsure. It uses some datasets which were generated using GPT-4 outputs, so openAI's terms may apply. I personally have no objection about this model being used for any commercial or non-commercial purpose, but please respect the license agreements of Meta, OpenAI or other parties involved.

Available Quantizations:

bfloat16
EXL2 4bit can run with 16K context in 2x24GB using Exllamav2
EXL2 6bit can run with the full 32K context in 3x24GB using Exllamav2

Hopefully someone can make some GGUFs.

Main functions:

Story Co-writing: Co-write a story with guided prompts over a 32K context, staying consistent with prior story details, capable of writing both long and short scenes. Start by explaining that you are writing a story scene-by-scene, provide some background, themes/tags, and describe what you want in the first scene.
Oneshot Story-writing: Write a complete story in one go, based on an outline, themes/tags, etc. Make sure you explain that this is not a scene-by-scene writing, and is meant to be written in a single go. You can specify a word-count to shoot for (though the model may not respect it).
Document Search/Analysis: Reading comprehension & finding information from a long document, or sets of documents (up to 32K tokens)

Secondary Functions (limited training so far, model may get confused between task types):

Roleplaying (RP): Explain what RP is, setup a scenario and characters, and start the RP. You can specify any rules like OOC or use of emojis, etc.
Interactive Fiction (IF) Emulation: Adventure game/interactive fiction emulation like Zork, Anchorhead, etc. Explain what it is, and how the AI should respond, specify the kind of game, tags, and so on. You can interact with usual commands like 'north', 'examine book', etc.
Choose Your Own Adventure (CYOA) Emulation: Explain what you're looking for and how you want the AI to respond (egs., with a numbered list of prompts at the end of each turn), and you can pick which option you want the story/game to go. Most such human-written games tend to have 1-2 prompts, so I had a hard time getting the AI to give more options. Finetuning is helping, but the model is now only half-baked.
Document Summary/Editing: Brief or comprehensive summaries of a long document, or sets of documents, in various formats (prose, bulleted list, table). Can also do some limited re-writing, conversions between formats and grammar checking.
General Chatting: Explain that it is a general chat, or provide some preamble to your interaction before starting. Otherwise the model might not know if you want to RP, story-write or something else.
General Logic/Reasoning: Same guidelines as above.

Prompting Guidelines

Treat the first prompt like you normally would the system prompt
- System prompt itself does not change
- Describe what you want the AI to do in detail in the first prompt, even if you feel it is obvious. This is how the AI can tell what sort of task it is supposed to perform (story-writing, RP, adventure game emulation, summarization, and so on).
- After that, specify anything else you want in the first prompt (your instructions for the next response, for instance).
Bias the length of the output with your prompt. This is no guarantee, so you may need to regenerate if you don't get your preferred length. The model will easily produce 2000+ tokens (egs., for a story scene), so make sure your response limit can handle that.
- Egs., Statements like Make this a long response would bias the response longer
- Statements like Respond briefly would bias it shorter
Explain clearly if you want the content to be SFW or NSFW in the first prompt as well. However, there are no guarantees that the model won't generate NSFW content if you force it to, in a later prompt, even if you specify the content should be SFW at the start. It's just a statistical bias (that should get better with more training).
Give the model details to go on. The more you flesh out what you want, the better and more consistently it will write. Tiny prompt = Tiny story, and more ChatGPTisms.

Known Issues

Blank Outputs: When you have many short prompts, sometimes the model just produces the EOS token. Especially with RP and adventure game emulation. I believe that this is due to this issue. Fixing in next iteration, but meanwhile, workarounds:
- Use Start reply with in Oobabooga to force the first token be something other than </s>
- Ban the EOS token (though you need to stop the generation manually in that case)
- Strip the space after the final [/INST], though I don't know of an easy way to do that without writing code in Oobabooga
- Ban the EOS token only for the first generated token, though I don't know of an easy way to do that without writing code in Oobabooga (seems like a good idea to always have)
- Wait for the next iteration where I think I have it fixed! Airoboros went through the same issue when they switched to Llama-chat.
Lack of Diversity for NSFW Content: Some common phrases & descriptions are over-used. I believe I know why this is, and believe it can be fixed with more training/diverse content (which is currently being done).
ChatGPTisms: Not refusals, but canned responses, happy endings, that sort of thing. While not common, this was NOT in the training data, but it shows up somehow, possibly because base Llama-2 has it baked in. I will eventually fight this with DPO (using prompt-biased GPT-4 generated responses as the rejected option), but for now, regenerate, or prompt engineer.

Training Data:

For 70% of the training data, the outputs were written by humans, though some of the inputs may have been originally seeded by GPT-4 (and expanded using other LORAs).

For 30% of the training data, the outpus were written by GPT3.5/4, but this was mostly logical reasoning, summarization and other non-creative content (with no issue of refusals or alignment). Some GPT4-generated creative content was present in the RP proxy logs however.

Partial List (there are other tiny datasets that I am yet to sort out and list, but this is the bulk of the training data):

Human-written stories from forums, The Pile and other sources, broken into Q&A and long-form multi-round completion format, using other LORAs.
- Will publish the relevant LORAs and data sources as best I can, once I sort everything out.
Summaries of Wikipedia articles in various formats (generated by gpt-4-1106-preview, will publish as soon as I can).
Document Error Correction: GPT4 generated passages with errors/typos introduced using the python typo library as input, with the original (correct) passages as output.
Sections of Airoboros 2.2.1/3.1 (RP, chain-of-thought, rules-based chats, theory of mind, dealigned writing, jokes/riddles).
Sections of Surge Instruct (for extraction, summarization, re-writing, classification).
Proxy RP Logs (GPT4 outputs only): Jannie, Teatime, AICG were all re-stitched together to create a single seamless conversion (sometimes from the original source) to undo the 2K or 4K divisions, and augmented with more context and rules about the conversation in the first pormpt. Will publish the stitched-up versions when I can.
A fully re-generated version of Floyd Text Adventures with better context and AI interaction format. Here is the link to the original until I upload the modified version.
A fully re-generated version of the CYS CYOA dataset (re-generated from source by 'dungeon crawling' the space automatically, maximizing visiting unique 'rooms', then converting the output logs into a chat format).
NART synthetic therapy logs was heavily filtered and used cautiously (lots of GPTisms, but actually relevant in this context where the AI is playing a supportive role).
Augmental Stenisgate RP was modified to add more context, and make the AI only play a single character (I'll publish the modded version as soon as I can).
Bluemoon RP was fully re-generated using Karen The Editor to clean it up. Until I publish the modified data on HF, you can get it from here: Part 1 Part 2
PIPPA RP was augmented to add more context and rules (derived from the content of the conversation). Will update with a link to the modified version.
LimaRP was slightly augmented to add more context, and the divided conversations were stitched together. No conversations were ever split up.
Erotic Analysis was used in reverse for one-shot NSFW story generation.
Reading Comprehension
Unnatural Instructions for word-constrained generation.
Long Instructions for relevant document finding/retrieval up to 32K.
OpenORca GPT4 outputs only.
Ultrachat Uncensored with capitalization errors fixed & further scrubbed for GPTisms (not just refusals, sentiment as well).
ShareGPT Hyper Filtered further scrubbed for GPTisms (not just refusals, sentiment as well).
Claude Multiround also further scrubbed, but being a different model than GPT4 I may not have caught all the gushing positivity.
Wizard Vicuna Unfiltered further scrubbed like the others.
TinyStories GPT4 I may not include this in the future.
SODA Synthetic Dialogue used with caution (mostly for title suggestions).