{ "cells": [ { "cell_type": "markdown", "source": [ "## To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n", "\n", "To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).\n", "\n", "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).\n", "\n", "**Llama-3 8b is trained on a crazy 15 trillion tokens! Llama-2 was 2 trillion.**" ], "metadata": { "id": "IqM-T1RTzY6C" } }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "2eSvM9zX_2d3" }, "outputs": [], "source": [ "%%capture\n", "# Installs Unsloth, Xformers (Flash Attention) and all other packages!\n", "!pip install \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\"\n", "!pip install --no-deps \"xformers<0.0.26\" trl peft accelerate bitsandbytes" ] }, { "cell_type": "code", "source": [ "token = \"\" #HF Token" ], "metadata": { "id": "ArRuzUceFbrW" }, "execution_count": 14, "outputs": [] }, { "cell_type": "markdown", "source": [ "* We support Llama, Mistral, CodeLlama, TinyLlama, Vicuna, Open Hermes etc\n", "* And Yi, Qwen ([llamafied](https://huggingface.co./models?sort=trending&search=qwen+llama)), Deepseek, all Llama, Mistral derived archs.\n", "* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.\n", "* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.\n", "* [**NEW**] With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co./unsloth) has Llama, Mistral 4bit models." ], "metadata": { "id": "r2v_X2fA0Df5" } }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 330, "referenced_widgets": [ "5fa5a89d00e44a099fcca268ebe73221", "c892fcfedd2d455daae59cba8a33fb2e", "278879a1abac4c3e8d0da4185551b34b", "e7471ad97b9346af9848783ef856e68a", "731f7e7a077f491eb9b53ad0a073810b", "74e57796898d4a769ceea9b51a3ac898", "6f366b5396734ec69d5278eb09044139", "6f72bbf0c6aa46d086ceb5da9ad9bb17", "661e6a316b304db8982810d870e02b67", "e78e842c8d454a9a9a78916c6cc67a0e", "e2a0f7306928486c82696c5aa10c62f8", "dc6b34ddb52e47f781c925701e012140", "84abffa3a2584132b4464048826d1ee3", "e018d1caab3f4e25aa6d0f72854e8214", "ce867c8eac414ccaba089813cad21046", "78ea1e7696e74396beb24769a727eed5", "2afd8f4c90194c8f8e4855101d2869b2", "e6fc052617ca4302ba573b13319e66d7", "db050984996448819c499c576b026f3d", "ef3f89cc612847a9b0b7b7f6cc69a85c", "c4594b03922346f2aa4c3d15d2c610e3", "48546d763782408f8d002bf12a391e89", "dca5d7bfce674fbd86b072b61a9fc900", "69ced868adc74863bd419943732c3935", "36ee0e065ac44262ac959db0276e492b", "5cb2a07f33ce4828b42ab3c65885976c", "2bb09009631c471f879c7d57f52d56be", "a80b037a8f6a47ce8fdd067f256dc1d3", "66d9cdbb00c74ad3b5078c9da8135c89", "0e8c8d5aa4c24813b44a638f9cdf0ecc", "62e7be83d4504cb697e83045f8bebcdc", "07d07944a82d4d2f9a865c24ca55d702", "63435b5647164359af9ecf39c06f4cba", "207b65186ac245ed9a1e389f453ec549", "411192405d524e7e803a9537753f1890", "f2ebbe9d84c24b588c385a9708ef2ea7", "79b3635ea7314d1583b0d23c6e5e849a", "bb69ca520a034a35851c74aedd16d587", "e77d3c9d25d54141aec449f61b3d3b1b", "16c7037582774c0a8254933c5eee77f9", "8564d706e3bb4d738ca445265e68a2eb", "de067aad4aaf42168cb7b632825d0364", "6c97d6dd23e340de8e880965bade938e", "e4e980773a704a8782eabfde149a2abe", "345aa8bc9f7149fcbf4b567be1012ff2", "8edd242d5289430595b1a2c39b680112", "e7e70fff105b4d16b4fb6c375cb44378", "49deeb922b1b4af18c155457edd65d1a", "6f511cf5f01a413aa8a0e48eb030fbf7", "3451975ededc45b699326115911eaa4f", "f79764846ccf4e7391a5dee0671bd7ac", "17b9c6a3d39a4ee28ed1af6e1e62760d", "cfb71364a2504c9a825faaa362ad036b", "fdca272ee71441059863076d62294320", "d84431b47c1b40f3af826b0d0630ac99", "5596db43b132418fbf02b86735e87e76", "17215bea9f414c18ae2b96d34da1b289", "45cda83b578e4413aba4aa1a36827125", "e72257591dda4072b37fa98fd0b5b7a8", "29a9ced97952442ebb8bb08d37c0dd30", "4ce00cd38cdb42a69535fd2e2f027484", "eb3ce4f3268140aca5a80e0be96ec4c0", "760bc5f1468f472993a18d27ef3d431d", "8a53839f9eb74eb9ac745f733b203708", "be5aa8c272634748839200f8be456304", "9141dd38302f421fa610565c717050ed" ] }, "id": "QmUBVEnvCDJv", "outputId": "cbae76b5-56a7-4f9f-ee7f-301d9de04eab" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "πŸ¦₯ Unsloth: Will patch your computer to enable 2x faster free finetuning.\n" ] }, { "output_type": "display_data", "data": { "text/plain": [ "config.json: 0%| | 0.00/1.18k [00:00 0 ! Suggested 8, 16, 32, 64, 128\n", " target_modules = [\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n", " \"gate_proj\", \"up_proj\", \"down_proj\",],\n", " lora_alpha = 16,\n", " lora_dropout = 0, # Supports any, but = 0 is optimized\n", " bias = \"none\", # Supports any, but = \"none\" is optimized\n", " # [NEW] \"unsloth\" uses 30% less VRAM, fits 2x larger batch sizes!\n", " use_gradient_checkpointing = \"unsloth\", # True or \"unsloth\" for very long context\n", " random_state = 3407,\n", " use_rslora = False, # We support rank stabilized LoRA\n", " loftq_config = None, # And LoftQ\n", ")" ] }, { "cell_type": "markdown", "source": [ "\n", "### Data Prep\n", "We now use the Alpaca dataset from [yahma](https://huggingface.co./datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.\n", "\n", "**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co./docs/trl/sft_trainer#train-on-completions-only).\n", "\n", "**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!\n", "\n", "If you want to use the `ChatML` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing).\n", "\n", "For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)." ], "metadata": { "id": "vITh0KVJ10qX" } }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "LjY75GoYUCB8", "colab": { "base_uri": "https://localhost:8080/", "height": 145, "referenced_widgets": [ "18d30950b13d4a78a4343957e7f875dd", "a2716018b641491bae1881cf539a94f5", "9d3051de0b854419870affae286f4a8a", "71e15b6c94804e0e88749a95f108860c", "7f77f005908546ac8bc8acfcaac86fd2", "7067c4367d244ae6a843b68317cbab1f", "761dd71092324dce89751e4cf79a1de2", "6d51ab8a582548459d9df2d322300d9f", "2d170043f93b466ea4733bb9dec165ac", "0f8ac9a63a654d09a6034636fc4a42b2", "2b2b42e3a52949b18ded26e85a294283", "b5c85fa0e0d4414984dab1b73aad9693", "61aedca85e86487695d870db2f142fa5", "fd37e6a3aa7f479a8ee3437ea3253554", "75fbeba7aed642319984b56b8ad2b39d", "06ce992fa40a473bb4e5344ed03c07a3", "82e736d269d147129dfbdfa7c193bed2", "16dd163be533414ca942353bf205e179", "4fb415246b7b4d549359943b2ff35f7c", "e80998e990394aaa9f961758814bea5c", "b9cc12a5cdb841b7ba653cde671d20c5", "a72b7e9ba5ba4b65b60600de0ed97696", "2edfa2e1a767441f8daea1de8f34cb30", "8aceb51199224e4dbd9c7c407e192bf9", "1aef1946d5dc4650934634b6fe3b93c6", "f0d9befb2a4b48a19832621fc25c08ca", "b95995b9309a44c1ae4ec6a80ae266d9", "3536247e3a7041a5a3c81bd0da605a7d", "e13540248b4648229bc75e47629ba626", "80af87f4b4354a17b765a57b6820a066", "18958d3681ca4757b3994c2ac6cbdf8f", "492c16d14a9a4bce82e9c8415560bb4e", "2bfdc58a58dd4b0a8f3710365b395e24", "6e71d0442bd4465898d243a98e4fb20c", "288cb756da5543ef8328ab33372ced05", "9ac4033b492d4468bed4cf02af57d665", "ab1f1221506f4912a24627dacc70e086", "fe4d0f8ef67a485da2bc097cf4a42ad1", "4232daa50677433187899d81f3f3e0fb", "aa7688b65ab54912b0c5f5e1883290fb", "9923ab697b1d4433abe636c834ffe03f", "30e286474b174321967403bc4557b94f", "7741a5531d7745dba1859f5f2df34467", "63ffea0c36de442ab0794ff479ac578d" ] }, "outputId": "006884a0-57e1-4d94-920e-341a71458690" }, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "Downloading readme: 0%| | 0.00/11.6k [00:00\n", "### Train the model\n", "Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co./docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!" ], "metadata": { "id": "idAEIeSQ3xdS" } }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "95_Nn-89DhsL", "colab": { "base_uri": "https://localhost:8080/", "height": 121, "referenced_widgets": [ "cc7f5f8d9df64d869dbebfe4ed71295a", "56d9cae4ad6946c7a8bc1227f361b768", "9111ce81bf1041a8916fc32d0aa2531f", "d5c6e5bbb91a4414a754cbce49c3d3c0", "f7a3ab8bd28e438b89d29a99b7a9e3f7", "9dc9a21504e445c1b2995a3cf0f67596", "3272610e33bd45a8ac43ddcc0f001ea0", "48af53b51a9842abb3705dd6a6d7c59a", "9a5dd25421d8427780d1179c175ba180", "da30f83adc914f44be4b6c67efb11277", "3440efc191e14664b7bbaaeacbb2167c" ] }, "outputId": "f208f3a3-8842-426a-bc42-4ec9588fb3cf" }, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "/usr/local/lib/python3.10/dist-packages/multiprocess/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.\n", " self.pid = os.fork()\n" ] }, { "output_type": "display_data", "data": { "text/plain": [ "Map (num_proc=2): 0%| | 0/51760 [00:00" ], "text/html": [ "\n", "
\n", " \n", " \n", " [60/60 07:20, Epoch 0/1]\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StepTraining Loss
11.844800
22.314800
31.715800
41.969700
51.662300
61.639100
71.186400
81.262000
91.096100
101.167900
110.982300
121.008600
130.937300
141.059800
150.901200
160.908000
171.020400
181.286500
191.018200
200.901200
210.948300
221.010900
230.887600
240.997300
251.073700
261.007800
271.034100
280.872100
290.833300
300.895500
310.868800
320.859800
330.975900
340.839500
350.956800
360.854200
370.881900
380.758600
391.073600
401.147900
410.896800
420.980700
430.946500
440.891600
450.916500
460.990200
470.864000
481.208100
490.902800
501.039500
511.011000
520.916400
530.992100
541.158300
550.794600
561.023300
570.881000
580.821800
590.854500
600.900100

" ] }, "metadata": {} } ], "source": [ "trainer_stats = trainer.train()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "id": "pCqnaKmlO1U9", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "1791f5cf-83a6-4411-cbcb-49e6205a33d8", "cellView": "form" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "461.2959 seconds used for training.\n", "7.69 minutes used for training.\n", "Peak reserved memory = 7.529 GB.\n", "Peak reserved memory for training = 1.935 GB.\n", "Peak reserved memory % of max memory = 51.051 %.\n", "Peak reserved memory for training % of max memory = 13.12 %.\n" ] } ], "source": [ "#@title Show final memory and time stats\n", "used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", "used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", "used_percentage = round(used_memory /max_memory*100, 3)\n", "lora_percentage = round(used_memory_for_lora/max_memory*100, 3)\n", "print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", "print(f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\")\n", "print(f\"Peak reserved memory = {used_memory} GB.\")\n", "print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", "print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", "print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" ] }, { "cell_type": "markdown", "source": [ "\n", "### Inference\n", "Let's run the model! You can change the instruction and input - leave the output blank!" ], "metadata": { "id": "ekOmTR1hSNcr" } }, { "cell_type": "code", "source": [ "# alpaca_prompt = Copied from above\n", "FastLanguageModel.for_inference(model) # Enable native 2x faster inference\n", "inputs = tokenizer(\n", "[\n", " alpaca_prompt.format(\n", " \"Continue the fibonnaci sequence.\", # instruction\n", " \"1, 1, 2, 3, 5, 8\", # input\n", " \"\", # output - leave this blank for generation!\n", " )\n", "], return_tensors = \"pt\").to(\"cuda\")\n", "\n", "outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)\n", "tokenizer.batch_decode(outputs)" ], "metadata": { "id": "kR3gIAX-SM2q", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "d51bc334-8565-46da-f38e-5edfabad4f5b" }, "execution_count": 11, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "['<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. \\nWrite a response that appropriately completes the request.\\n\\n### Instruction:\\nContinue the fibonnaci sequence.\\n\\n### Input:\\n1, 1, 2, 3, 5, 8\\n\\n### Response:\\n13, 21, 34, 55, 89, 144, 233, 377, 610, 987<|end_of_text|>']" ] }, "metadata": {}, "execution_count": 11 } ] }, { "cell_type": "markdown", "source": [ " You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!" ], "metadata": { "id": "CrSvZObor0lY" } }, { "cell_type": "code", "source": [ "# alpaca_prompt = Copied from above\n", "FastLanguageModel.for_inference(model) # Enable native 2x faster inference\n", "inputs = tokenizer(\n", "[\n", " alpaca_prompt.format(\n", " \"Continue the fibonnaci sequence.\", # instruction\n", " \"1, 1, 2, 3, 5, 8\", # input\n", " \"\", # output - leave this blank for generation!\n", " )\n", "], return_tensors = \"pt\").to(\"cuda\")\n", "\n", "from transformers import TextStreamer\n", "text_streamer = TextStreamer(tokenizer)\n", "_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)" ], "metadata": { "id": "e2pEuRb1r2Vg", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "b4c42b49-8eca-4169-aea8-81dd70a44980" }, "execution_count": 12, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.\n" ] }, { "output_type": "stream", "name": "stdout", "text": [ "<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. \n", "Write a response that appropriately completes the request.\n", "\n", "### Instruction:\n", "Continue the fibonnaci sequence.\n", "\n", "### Input:\n", "1, 1, 2, 3, 5, 8\n", "\n", "### Response:\n", "13, 21, 34, 55, 89, 144<|end_of_text|>\n" ] } ] }, { "cell_type": "markdown", "source": [ "\n", "### Saving, loading finetuned models\n", "To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.\n", "\n", "**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" ], "metadata": { "id": "uMuVrWbjAzhc" } }, { "cell_type": "code", "source": [ "#model.save_pretrained(\"lora_model\") # Local saving\n", "#tokenizer.save_pretrained(\"lora_model\")\n", "model.push_to_hub(\"ArunKr/LLama3-LoRA\", token = token) # Online saving\n", "tokenizer.push_to_hub(\"ArunKr/LLama3-LoRA\", token = token) # Online saving" ], "metadata": { "id": "upcOlWe7A1vc" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" ], "metadata": { "id": "AEEcJ4qfC7Lp" } }, { "cell_type": "code", "source": [ "if False:\n", " from unsloth import FastLanguageModel\n", " model, tokenizer = FastLanguageModel.from_pretrained(\n", " model_name = \"lora_model\", # YOUR MODEL YOU USED FOR TRAINING\n", " max_seq_length = max_seq_length,\n", " dtype = dtype,\n", " load_in_4bit = load_in_4bit,\n", " )\n", " FastLanguageModel.for_inference(model) # Enable native 2x faster inference\n", "\n", "# alpaca_prompt = You MUST copy from above!\n", "\n", "inputs = tokenizer(\n", "[\n", " alpaca_prompt.format(\n", " \"What is a famous tall tower in Paris?\", # instruction\n", " \"\", # input\n", " \"\", # output - leave this blank for generation!\n", " )\n", "], return_tensors = \"pt\").to(\"cuda\")\n", "\n", "outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)\n", "tokenizer.batch_decode(outputs)" ], "metadata": { "id": "MKX_XKs_BNZR" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**." ], "metadata": { "id": "QQMjaNrjsU5_" } }, { "cell_type": "code", "source": [ "if False:\n", " # I highly do NOT suggest - use Unsloth if possible\n", " from peft import AutoPeftModelForCausalLM\n", " from transformers import AutoTokenizer\n", " model = AutoPeftModelForCausalLM.from_pretrained(\n", " \"lora_model\", # YOUR MODEL YOU USED FOR TRAINING\n", " load_in_4bit = load_in_4bit,\n", " )\n", " tokenizer = AutoTokenizer.from_pretrained(\"lora_model\")" ], "metadata": { "id": "yFfaXG0WsQuE" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### Saving to float16 for VLLM\n", "\n", "We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co./settings/tokens for your personal tokens." ], "metadata": { "id": "f422JgM9sdVT" } }, { "cell_type": "code", "source": [ "# Merge to 16bit\n", "if False: model.save_pretrained_merged(\"model\", tokenizer, save_method = \"merged_16bit\",)\n", "if True: model.push_to_hub_merged(\"ArunKr/LLama3-LoRA\", tokenizer, save_method = \"merged_16bit\", token = token)\n", "\n", "# Merge to 4bit\n", "if False: model.save_pretrained_merged(\"model\", tokenizer, save_method = \"merged_4bit\",)\n", "if True: model.push_to_hub_merged(\"ArunKr/LLama3-LoRA\", tokenizer, save_method = \"merged_4bit_forced\", token = token)\n", "\n", "# Just LoRA adapters\n", "if False: model.save_pretrained_merged(\"model\", tokenizer, save_method = \"lora\",)\n", "if True: model.push_to_hub_merged(\"ArunKr/LLama3-LoRA\", tokenizer, save_method = \"lora\", token = token)" ], "metadata": { "id": "iHjt_SMYsd3P" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "### GGUF / llama.cpp Conversion\n", "To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.\n", "\n", "Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):\n", "* `q8_0` - Fast conversion. High resource use, but generally acceptable.\n", "* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.\n", "* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K." ], "metadata": { "id": "TCv4vXHd61i7" } }, { "cell_type": "code", "source": [ "# Save to 8bit Q8_0\n", "if False: model.save_pretrained_gguf(\"model\", tokenizer,)\n", "if True: model.push_to_hub_gguf(\"ArunKr/LLama3-LoRA\", tokenizer, token = token)\n", "\n", "# Save to 16bit GGUF\n", "if False: model.save_pretrained_gguf(\"model\", tokenizer, quantization_method = \"f16\")\n", "if True: model.push_to_hub_gguf(\"ArunKr/LLama3-LoRA\", tokenizer, quantization_method = \"f16\", token = token)\n", "\n", "# Save to q4_k_m GGUF\n", "if False: model.save_pretrained_gguf(\"model\", tokenizer, quantization_method = \"q4_k_m\")\n", "if True: model.push_to_hub_gguf(\"ArunKr/LLama3-LoRA\", tokenizer, quantization_method = \"q4_k_m\", token = token)" ], "metadata": { "id": "FqfebeAdT073", "colab": { "base_uri": "https://localhost:8080/", "height": 1000, "referenced_widgets": [ "b95397410bbe4938a5810246fdbf5d87", "e9dfbbb6122546cf96c935de6f545829", "c3e151e2004b4758a9237b94f5446765", "7b26c2c076334d13a08fa54d505d2fb0", "a6035912c95f4fa9bfedacf634772fce", "cff5eb22adb64355b5d76c4dddca718f", "80fd72d67b824088b22bc967659089cd", "5441726d51e34c2b807fe948b03b7118", "8a55d1a5551e41b6ae091c11fc425acb", "40233286bb6844bcbf33c0af69d054f0", "0fa835b65c354fe6becb26453dd0be22", "dc3d5743af794ece9037c664e1feffee", "81f40629d1124416beb12244978ba774", "c7f0ebb8408e46e989a9811c1c28ba5b", "f7fb961fbccf423ea009abef48b2133f", "94e951c3ca1b4f1cb001cc66095b3aef", "7b3fb50ad3aa4f008291bb23546e9171", "a4852d21d217479db51ef5eeeba0e056", "9d4d416a49154cac80c1ba95f63632ab", "9ba83b26feb448e18e31a231dbce85e6", "9194913d7ceb45b29fec97d0170801c3", "f911c583d1b94144be63f841b07dade6", "e73ce1c1e94f4457ae5fbfd22383613d", "5189cf6f04f8466bafa3d573b79edc5e", "cba9f36fa8d543a8ac7735f84b55485e", "92f4b137196440d1bc01c9d765dc77b4", "fe5d71041d604f5f8bc926d21af9f7ca", "b8bf4a433b594071ba4050594a5cfbab", "6f3f8087ee1649ad8006c49d50881752", "253f39b6c34b4a2fbe0845cf87fcf1e5", "e011ea526230447aa7dffc159049e638", "c0c1ee431dbf47e88b01a74a2b686366", "d0fe3fa17f974927972ac7bbeecdae44" ] }, "outputId": "c9db3a5c-fff5-470f-a4b5-2fd9542a5772" }, "execution_count": 13, "outputs": [ { "metadata": { "tags": null }, "name": "stderr", "output_type": "stream", "text": [ "Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.\n", "We shall switch to Pytorch saving, which will take 3 minutes and not 30 minutes.\n", "To force `safe_serialization`, set it to `None` instead.\n", "Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded\n", "model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.\n", "Unsloth: Will remove a cached repo with size 5.7G\n" ] }, { "metadata": { "tags": null }, "name": "stdout", "output_type": "stream", "text": [ "Unsloth: Merging 4bit and LoRA weights to 16bit...\n", "Unsloth: Will use up to 5.72 out of 12.67 RAM for saving.\n" ] }, { "metadata": { "tags": null }, "name": "stderr", "output_type": "stream", "text": [ " 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 15/32 [00:01<00:01, 11.09it/s]We will save to Disk and not RAM now.\n", "100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 32/32 [01:51<00:00, 3.48s/it]\n" ] }, { "metadata": { "tags": null }, "name": "stdout", "output_type": "stream", "text": [ "Unsloth: Saving tokenizer... Done.\n", "Unsloth: Saving model... This might take 5 minutes for Llama-7b...\n", "Unsloth: Saving Arun1982/LLama3-LoRA/pytorch_model-00001-of-00004.bin...\n", "Unsloth: Saving Arun1982/LLama3-LoRA/pytorch_model-00002-of-00004.bin...\n", "Unsloth: Saving Arun1982/LLama3-LoRA/pytorch_model-00003-of-00004.bin...\n", "Unsloth: Saving Arun1982/LLama3-LoRA/pytorch_model-00004-of-00004.bin...\n", "Done.\n" ] }, { "metadata": { "tags": null }, "name": "stderr", "output_type": "stream", "text": [ "Unsloth: Converting llama model. Can use fast conversion = False.\n", "Unsloth: We must use f16 for non Llama and Mistral models.\n" ] }, { "metadata": { "tags": null }, "name": "stdout", "output_type": "stream", "text": [ "==((====))== Unsloth: Conversion from QLoRA to GGUF information\n", " \\\\ /| [0] Installing llama.cpp will take 3 minutes.\n", "O^O/ \\_/ \\ [1] Converting HF to GUUF 16bits will take 3 minutes.\n", "\\ / [2] Converting GGUF 16bits to q8_0 will take 20 minutes.\n", " \"-____-\" In total, you will have to wait around 26 minutes.\n", "\n", "Unsloth: [0] Installing llama.cpp. This will take 3 minutes...\n", "Unsloth: [1] Converting model at Arun1982/LLama3-LoRA into f16 GGUF format.\n", "The output location will be ./Arun1982/LLama3-LoRA-unsloth.F16.gguf\n", "This will take 3 minutes...\n", "INFO:hf-to-gguf:Loading model: LLama3-LoRA\n", "INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only\n", "INFO:hf-to-gguf:Set model parameters\n", "INFO:hf-to-gguf:gguf: context length = 8192\n", "INFO:hf-to-gguf:gguf: embedding length = 4096\n", "INFO:hf-to-gguf:gguf: feed forward length = 14336\n", "INFO:hf-to-gguf:gguf: head count = 32\n", "INFO:hf-to-gguf:gguf: key-value head count = 8\n", "INFO:hf-to-gguf:gguf: rope theta = 500000.0\n", "INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05\n", "INFO:hf-to-gguf:gguf: file type = 1\n", "INFO:hf-to-gguf:Set model tokenizer\n", "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n", "INFO:gguf.vocab:Adding 280147 merge(s).\n", "INFO:gguf.vocab:Setting special token type bos to 128000\n", "INFO:gguf.vocab:Setting special token type eos to 128001\n", "INFO:gguf.vocab:Setting special token type pad to 128255\n", "INFO:hf-to-gguf:Exporting model to 'Arun1982/LLama3-LoRA-unsloth.F16.gguf'\n", "INFO:hf-to-gguf:gguf: loading model weight map from 'pytorch_model.bin.index.json'\n", "INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00001-of-00004.bin'\n", "INFO:hf-to-gguf:token_embd.weight, torch.float16 --> F16, shape = {4096, 128256}\n", "INFO:hf-to-gguf:blk.0.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.0.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.0.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.0.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.1.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.1.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.1.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.1.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.2.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.2.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.2.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.2.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.3.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.3.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.3.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.3.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.4.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.4.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.4.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.4.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.5.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.5.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.5.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.5.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.6.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.6.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.6.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.6.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.7.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.7.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.7.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.7.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.8.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.8.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.8.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.8.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00002-of-00004.bin'\n", "INFO:hf-to-gguf:blk.9.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.9.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.9.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.9.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.10.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.10.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.10.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.10.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.11.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.11.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.11.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.11.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.12.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.12.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.12.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.12.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.13.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.13.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.13.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.13.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.14.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.14.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.14.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.14.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.15.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.15.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.15.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.15.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.16.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.16.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.16.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.16.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.16.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.16.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.16.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.17.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.17.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.17.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.17.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.17.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.17.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.17.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.18.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.18.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.18.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.18.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.18.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.18.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.18.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.19.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.19.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.19.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.19.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.19.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.19.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.19.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.20.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.20.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.20.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.20.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.20.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00003-of-00004.bin'\n", "INFO:hf-to-gguf:blk.20.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.20.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.21.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.21.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.21.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.21.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.21.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.21.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.21.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.22.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.22.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.22.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.22.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.22.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.22.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.22.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.23.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.23.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.23.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.23.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.23.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.23.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.23.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.24.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.24.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.24.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.24.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.24.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.24.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.24.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.24.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.24.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.25.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.25.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.25.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.25.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.25.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.25.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.25.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.25.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.25.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.26.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.26.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.26.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.26.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.26.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.26.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.26.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.26.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.26.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.27.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.27.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.27.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.27.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.27.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.27.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.27.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.27.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.27.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.28.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.28.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.28.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.28.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.28.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.28.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.28.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.28.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.28.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.29.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.29.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.29.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.29.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.29.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.29.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.29.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.29.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.29.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.30.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.30.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.30.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.30.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.30.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.30.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.30.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.30.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.30.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.31.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.31.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.31.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.31.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.31.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.31.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00004-of-00004.bin'\n", "INFO:hf-to-gguf:blk.31.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.31.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.31.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:output_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:output.weight, torch.float16 --> F16, shape = {4096, 128256}\n", "Writing: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16.1G/16.1G [03:47<00:00, 70.7Mbyte/s]\n", "INFO:hf-to-gguf:Model successfully exported to 'Arun1982/LLama3-LoRA-unsloth.F16.gguf'\n", "Unsloth: Conversion completed! Output location: ./Arun1982/LLama3-LoRA-unsloth.F16.gguf\n", "Unsloth: [2] Converting GGUF 16bit into q8_0. This will take 20 minutes...\n", "main: build = 2927 (511182ea)\n", "main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu\n", "main: quantizing './Arun1982/LLama3-LoRA-unsloth.F16.gguf' to './Arun1982/LLama3-LoRA-unsloth.Q8_0.gguf' as Q8_0 using 4 threads\n", "llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from ./Arun1982/LLama3-LoRA-unsloth.F16.gguf (version GGUF V3 (latest))\n", "llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n", "llama_model_loader: - kv 0: general.architecture str = llama\n", "llama_model_loader: - kv 1: general.name str = LLama3-LoRA\n", "llama_model_loader: - kv 2: llama.block_count u32 = 32\n", "llama_model_loader: - kv 3: llama.context_length u32 = 8192\n", "llama_model_loader: - kv 4: llama.embedding_length u32 = 4096\n", "llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336\n", "llama_model_loader: - kv 6: llama.attention.head_count u32 = 32\n", "llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8\n", "llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000\n", "llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010\n", "llama_model_loader: - kv 10: general.file_type u32 = 1\n", "llama_model_loader: - kv 11: llama.vocab_size u32 = 128256\n", "llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128\n", "llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2\n", "llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe\n", "llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = [\"!\", \"\\\"\", \"#\", \"$\", \"%\", \"&\", \"'\", ...\n", "llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...\n", "llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = [\"Δ  Δ \", \"Δ  Δ Δ Δ \", \"Δ Δ  Δ Δ \", \"...\n", "llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000\n", "llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001\n", "llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 128255\n", "llama_model_loader: - kv 21: general.quantization_version u32 = 2\n", "llama_model_loader: - type f32: 65 tensors\n", "llama_model_loader: - type f16: 226 tensors\n", "[ 1/ 291] token_embd.weight - [ 4096, 128256, 1, 1], type = f16, converting to q8_0 .. size = 1002.00 MiB -> 532.31 MiB\n", "[ 2/ 291] blk.0.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 3/ 291] blk.0.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 4/ 291] blk.0.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 5/ 291] blk.0.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 6/ 291] blk.0.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 7/ 291] blk.0.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 8/ 291] blk.0.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 9/ 291] blk.0.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 10/ 291] blk.0.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 11/ 291] blk.1.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 12/ 291] blk.1.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 13/ 291] blk.1.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 14/ 291] blk.1.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 15/ 291] blk.1.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 16/ 291] blk.1.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 17/ 291] blk.1.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 18/ 291] blk.1.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 19/ 291] blk.1.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 20/ 291] blk.2.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 21/ 291] blk.2.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 22/ 291] blk.2.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 23/ 291] blk.2.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 24/ 291] blk.2.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 25/ 291] blk.2.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 26/ 291] blk.2.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 27/ 291] blk.2.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 28/ 291] blk.2.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 29/ 291] blk.3.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 30/ 291] blk.3.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 31/ 291] blk.3.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 32/ 291] blk.3.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 33/ 291] blk.3.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 34/ 291] blk.3.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 35/ 291] blk.3.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 36/ 291] blk.3.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 37/ 291] blk.3.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 38/ 291] blk.4.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 39/ 291] blk.4.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 40/ 291] blk.4.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 41/ 291] blk.4.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 42/ 291] blk.4.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 43/ 291] blk.4.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 44/ 291] blk.4.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 45/ 291] blk.4.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 46/ 291] blk.4.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 47/ 291] blk.5.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 48/ 291] blk.5.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 49/ 291] blk.5.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 50/ 291] blk.5.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 51/ 291] blk.5.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 52/ 291] blk.5.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 53/ 291] blk.5.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 54/ 291] blk.5.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 55/ 291] blk.5.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 56/ 291] blk.6.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 57/ 291] blk.6.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 58/ 291] blk.6.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 59/ 291] blk.6.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 60/ 291] blk.6.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 61/ 291] blk.6.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 62/ 291] blk.6.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 63/ 291] blk.6.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 64/ 291] blk.6.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 65/ 291] blk.7.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 66/ 291] blk.7.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 67/ 291] blk.7.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 68/ 291] blk.7.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 69/ 291] blk.7.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 70/ 291] blk.7.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 71/ 291] blk.7.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 72/ 291] blk.7.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 73/ 291] blk.7.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 74/ 291] blk.8.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 75/ 291] blk.8.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 76/ 291] blk.8.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 77/ 291] blk.8.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 78/ 291] blk.8.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 79/ 291] blk.8.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 80/ 291] blk.8.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 81/ 291] blk.8.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 82/ 291] blk.8.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 83/ 291] blk.9.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 84/ 291] blk.9.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 85/ 291] blk.9.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 86/ 291] blk.9.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 87/ 291] blk.9.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 88/ 291] blk.9.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 89/ 291] blk.9.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 90/ 291] blk.9.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 91/ 291] blk.9.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 92/ 291] blk.10.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 93/ 291] blk.10.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 94/ 291] blk.10.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 95/ 291] blk.10.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 96/ 291] blk.10.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 97/ 291] blk.10.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 98/ 291] blk.10.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 99/ 291] blk.10.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 100/ 291] blk.10.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 101/ 291] blk.11.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 102/ 291] blk.11.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 103/ 291] blk.11.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 104/ 291] blk.11.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 105/ 291] blk.11.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 106/ 291] blk.11.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 107/ 291] blk.11.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 108/ 291] blk.11.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 109/ 291] blk.11.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 110/ 291] blk.12.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 111/ 291] blk.12.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 112/ 291] blk.12.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 113/ 291] blk.12.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 114/ 291] blk.12.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 115/ 291] blk.12.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 116/ 291] blk.12.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 117/ 291] blk.12.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 118/ 291] blk.12.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 119/ 291] blk.13.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 120/ 291] blk.13.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 121/ 291] blk.13.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 122/ 291] blk.13.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 123/ 291] blk.13.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 124/ 291] blk.13.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 125/ 291] blk.13.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 126/ 291] blk.13.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 127/ 291] blk.13.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 128/ 291] blk.14.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 129/ 291] blk.14.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 130/ 291] blk.14.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 131/ 291] blk.14.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 132/ 291] blk.14.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 133/ 291] blk.14.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 134/ 291] blk.14.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 135/ 291] blk.14.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 136/ 291] blk.14.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 137/ 291] blk.15.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 138/ 291] blk.15.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 139/ 291] blk.15.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 140/ 291] blk.15.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 141/ 291] blk.15.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 142/ 291] blk.15.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 143/ 291] blk.15.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 144/ 291] blk.15.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 145/ 291] blk.15.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 146/ 291] blk.16.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 147/ 291] blk.16.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 148/ 291] blk.16.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 149/ 291] blk.16.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 150/ 291] blk.16.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 151/ 291] blk.16.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 152/ 291] blk.16.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 153/ 291] blk.16.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 154/ 291] blk.16.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 155/ 291] blk.17.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 156/ 291] blk.17.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 157/ 291] blk.17.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 158/ 291] blk.17.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 159/ 291] blk.17.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 160/ 291] blk.17.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 161/ 291] blk.17.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 162/ 291] blk.17.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 163/ 291] blk.17.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 164/ 291] blk.18.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 165/ 291] blk.18.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 166/ 291] blk.18.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 167/ 291] blk.18.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 168/ 291] blk.18.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 169/ 291] blk.18.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 170/ 291] blk.18.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 171/ 291] blk.18.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 172/ 291] blk.18.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 173/ 291] blk.19.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 174/ 291] blk.19.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 175/ 291] blk.19.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 176/ 291] blk.19.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 177/ 291] blk.19.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 178/ 291] blk.19.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 179/ 291] blk.19.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 180/ 291] blk.19.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 181/ 291] blk.19.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 182/ 291] blk.20.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 183/ 291] blk.20.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 184/ 291] blk.20.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 185/ 291] blk.20.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 186/ 291] blk.20.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 187/ 291] blk.20.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 188/ 291] blk.20.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 189/ 291] blk.20.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 190/ 291] blk.20.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 191/ 291] blk.21.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 192/ 291] blk.21.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 193/ 291] blk.21.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 194/ 291] blk.21.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 195/ 291] blk.21.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 196/ 291] blk.21.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 197/ 291] blk.21.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 198/ 291] blk.21.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 199/ 291] blk.21.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 200/ 291] blk.22.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 201/ 291] blk.22.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 202/ 291] blk.22.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 203/ 291] blk.22.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 204/ 291] blk.22.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 205/ 291] blk.22.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 206/ 291] blk.22.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 207/ 291] blk.22.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 208/ 291] blk.22.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 209/ 291] blk.23.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 210/ 291] blk.23.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 211/ 291] blk.23.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 212/ 291] blk.23.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 213/ 291] blk.23.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 214/ 291] blk.23.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 215/ 291] blk.23.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 216/ 291] blk.23.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 217/ 291] blk.23.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 218/ 291] blk.24.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 219/ 291] blk.24.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 220/ 291] blk.24.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 221/ 291] blk.24.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 222/ 291] blk.24.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 223/ 291] blk.24.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 224/ 291] blk.24.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 225/ 291] blk.24.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 226/ 291] blk.24.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 227/ 291] blk.25.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 228/ 291] blk.25.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 229/ 291] blk.25.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 230/ 291] blk.25.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 231/ 291] blk.25.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 232/ 291] blk.25.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 233/ 291] blk.25.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 234/ 291] blk.25.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 235/ 291] blk.25.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 236/ 291] blk.26.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 237/ 291] blk.26.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 238/ 291] blk.26.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 239/ 291] blk.26.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 240/ 291] blk.26.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 241/ 291] blk.26.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 242/ 291] blk.26.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 243/ 291] blk.26.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 244/ 291] blk.26.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 245/ 291] blk.27.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 246/ 291] blk.27.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 247/ 291] blk.27.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 248/ 291] blk.27.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 249/ 291] blk.27.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 250/ 291] blk.27.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 251/ 291] blk.27.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 252/ 291] blk.27.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 253/ 291] blk.27.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 254/ 291] blk.28.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 255/ 291] blk.28.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 256/ 291] blk.28.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 257/ 291] blk.28.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 258/ 291] blk.28.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 259/ 291] blk.28.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 260/ 291] blk.28.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 261/ 291] blk.28.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 262/ 291] blk.28.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 263/ 291] blk.29.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 264/ 291] blk.29.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 265/ 291] blk.29.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 266/ 291] blk.29.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 267/ 291] blk.29.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 268/ 291] blk.29.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 269/ 291] blk.29.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 270/ 291] blk.29.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 271/ 291] blk.29.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 272/ 291] blk.30.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 273/ 291] blk.30.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 274/ 291] blk.30.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 275/ 291] blk.30.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 276/ 291] blk.30.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 277/ 291] blk.30.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 278/ 291] blk.30.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 279/ 291] blk.30.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 280/ 291] blk.30.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 281/ 291] blk.31.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 282/ 291] blk.31.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 283/ 291] blk.31.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", "[ 284/ 291] blk.31.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", "[ 285/ 291] blk.31.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 286/ 291] blk.31.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 287/ 291] blk.31.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q8_0 .. size = 112.00 MiB -> 59.50 MiB\n", "[ 288/ 291] blk.31.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 289/ 291] blk.31.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 290/ 291] output_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 291/ 291] output.weight - [ 4096, 128256, 1, 1], type = f16, converting to q8_0 .. size = 1002.00 MiB -> 532.31 MiB\n", "llama_model_quantize_internal: model size = 15317.02 MB\n", "llama_model_quantize_internal: quant size = 8137.64 MB\n", "\n", "main: quantize time = 211180.51 ms\n", "main: total time = 211180.51 ms\n", "Unsloth: Conversion completed! Output location: ./Arun1982/LLama3-LoRA-unsloth.Q8_0.gguf\n", "Unsloth: Uploading GGUF to Huggingface Hub...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b95397410bbe4938a5810246fdbf5d87", "version_major": 2, "version_minor": 0 }, "text/plain": [ "LLama3-LoRA-unsloth.Q8_0.gguf: 0%| | 0.00/8.54G [00:00 F16, shape = {4096, 128256}\n", "INFO:hf-to-gguf:blk.0.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.0.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.0.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.0.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.1.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.1.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.1.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.1.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.2.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.2.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.2.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.2.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.3.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.3.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.3.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.3.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.4.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.4.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.4.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.4.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.5.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.5.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.5.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.5.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.6.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.6.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.6.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.6.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.7.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.7.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.7.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.7.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.8.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.8.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.8.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.8.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00002-of-00004.bin'\n", "INFO:hf-to-gguf:blk.9.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.9.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.9.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.9.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.10.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.10.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.10.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.10.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.11.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.11.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.11.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.11.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.12.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.12.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.12.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.12.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.13.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.13.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.13.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.13.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.14.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.14.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.14.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.14.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.15.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.15.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.15.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.15.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.16.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.16.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.16.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.16.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.16.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.16.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.16.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.17.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.17.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.17.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.17.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.17.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.17.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.17.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.18.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.18.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.18.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.18.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.18.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.18.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.18.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.19.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.19.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.19.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.19.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.19.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.19.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.19.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.20.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.20.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.20.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.20.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.20.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00003-of-00004.bin'\n", "INFO:hf-to-gguf:blk.20.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.20.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.21.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.21.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.21.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.21.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.21.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.21.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.21.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.22.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.22.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.22.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.22.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.22.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.22.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.22.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.23.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.23.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.23.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.23.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.23.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.23.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.23.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.24.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.24.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.24.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.24.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.24.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.24.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.24.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.24.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.24.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.25.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.25.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.25.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.25.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.25.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.25.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.25.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.25.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.25.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.26.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.26.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.26.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.26.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.26.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.26.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.26.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.26.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.26.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.27.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.27.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.27.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.27.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.27.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.27.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.27.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.27.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.27.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.28.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.28.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.28.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.28.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.28.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.28.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.28.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.28.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.28.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.29.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.29.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.29.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.29.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.29.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.29.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.29.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.29.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.29.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.30.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.30.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.30.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.30.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.30.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.30.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.30.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.30.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.30.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.31.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.31.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.31.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.31.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.31.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.31.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00004-of-00004.bin'\n", "INFO:hf-to-gguf:blk.31.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.31.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.31.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:output_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:output.weight, torch.float16 --> F16, shape = {4096, 128256}\n", "Writing: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16.1G/16.1G [03:51<00:00, 69.4Mbyte/s]\n", "INFO:hf-to-gguf:Model successfully exported to 'Arun1982/LLama3-LoRA-unsloth.F16.gguf'\n", "Unsloth: Conversion completed! Output location: ./Arun1982/LLama3-LoRA-unsloth.F16.gguf\n", "Unsloth: Uploading GGUF to Huggingface Hub...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "dc3d5743af794ece9037c664e1feffee", "version_major": 2, "version_minor": 0 }, "text/plain": [ "LLama3-LoRA-unsloth.F16.gguf: 0%| | 0.00/16.1G [00:00 F16, shape = {4096, 128256}\n", "INFO:hf-to-gguf:blk.0.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.0.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.0.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.0.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.1.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.1.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.1.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.1.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.2.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.2.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.2.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.2.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.3.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.3.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.3.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.3.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.4.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.4.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.4.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.4.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.5.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.5.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.5.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.5.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.6.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.6.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.6.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.6.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.7.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.7.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.7.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.7.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.8.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.8.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.8.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.8.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00002-of-00004.bin'\n", "INFO:hf-to-gguf:blk.9.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.9.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.9.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.9.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.10.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.10.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.10.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.10.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.11.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.11.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.11.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.11.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.12.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.12.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.12.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.12.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.13.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.13.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.13.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.13.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.14.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.14.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.14.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.14.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.15.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.15.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.15.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.15.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.16.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.16.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.16.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.16.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.16.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.16.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.16.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.17.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.17.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.17.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.17.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.17.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.17.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.17.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.18.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.18.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.18.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.18.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.18.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.18.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.18.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.19.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.19.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.19.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.19.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.19.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.19.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.19.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.20.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.20.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.20.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.20.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.20.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00003-of-00004.bin'\n", "INFO:hf-to-gguf:blk.20.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.20.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.21.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.21.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.21.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.21.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.21.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.21.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.21.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.22.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.22.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.22.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.22.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.22.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.22.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.22.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.23.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.23.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.23.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.23.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.23.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.23.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.23.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.24.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.24.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.24.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.24.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.24.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.24.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.24.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.24.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.24.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.25.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.25.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.25.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.25.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.25.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.25.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.25.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.25.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.25.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.26.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.26.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.26.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.26.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.26.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.26.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.26.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.26.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.26.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.27.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.27.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.27.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.27.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.27.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.27.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.27.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.27.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.27.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.28.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.28.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.28.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.28.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.28.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.28.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.28.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.28.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.28.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.29.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.29.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.29.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.29.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.29.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.29.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.29.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.29.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.29.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.30.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.30.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.30.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.30.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.30.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.30.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.30.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.30.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.30.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.31.attn_q.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.31.attn_k.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.31.attn_v.weight, torch.float16 --> F16, shape = {4096, 1024}\n", "INFO:hf-to-gguf:blk.31.attn_output.weight, torch.float16 --> F16, shape = {4096, 4096}\n", "INFO:hf-to-gguf:blk.31.ffn_gate.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:blk.31.ffn_up.weight, torch.float16 --> F16, shape = {4096, 14336}\n", "INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00004-of-00004.bin'\n", "INFO:hf-to-gguf:blk.31.ffn_down.weight, torch.float16 --> F16, shape = {14336, 4096}\n", "INFO:hf-to-gguf:blk.31.attn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:blk.31.ffn_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:output_norm.weight, torch.float16 --> F32, shape = {4096}\n", "INFO:hf-to-gguf:output.weight, torch.float16 --> F16, shape = {4096, 128256}\n", "Writing: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16.1G/16.1G [03:49<00:00, 69.9Mbyte/s]\n", "INFO:hf-to-gguf:Model successfully exported to 'Arun1982/LLama3-LoRA-unsloth.F16.gguf'\n", "Unsloth: Conversion completed! Output location: ./Arun1982/LLama3-LoRA-unsloth.F16.gguf\n", "Unsloth: [2] Converting GGUF 16bit into q4_k_m. This will take 20 minutes...\n", "main: build = 2927 (511182ea)\n", "main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu\n", "main: quantizing './Arun1982/LLama3-LoRA-unsloth.F16.gguf' to './Arun1982/LLama3-LoRA-unsloth.Q4_K_M.gguf' as Q4_K_M using 4 threads\n", "llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from ./Arun1982/LLama3-LoRA-unsloth.F16.gguf (version GGUF V3 (latest))\n", "llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n", "llama_model_loader: - kv 0: general.architecture str = llama\n", "llama_model_loader: - kv 1: general.name str = LLama3-LoRA\n", "llama_model_loader: - kv 2: llama.block_count u32 = 32\n", "llama_model_loader: - kv 3: llama.context_length u32 = 8192\n", "llama_model_loader: - kv 4: llama.embedding_length u32 = 4096\n", "llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336\n", "llama_model_loader: - kv 6: llama.attention.head_count u32 = 32\n", "llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8\n", "llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000\n", "llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010\n", "llama_model_loader: - kv 10: general.file_type u32 = 1\n", "llama_model_loader: - kv 11: llama.vocab_size u32 = 128256\n", "llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128\n", "llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2\n", "llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe\n", "llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = [\"!\", \"\\\"\", \"#\", \"$\", \"%\", \"&\", \"'\", ...\n", "llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...\n", "llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = [\"Δ  Δ \", \"Δ  Δ Δ Δ \", \"Δ Δ  Δ Δ \", \"...\n", "llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000\n", "llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001\n", "llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 128255\n", "llama_model_loader: - kv 21: general.quantization_version u32 = 2\n", "llama_model_loader: - type f32: 65 tensors\n", "llama_model_loader: - type f16: 226 tensors\n", "[ 1/ 291] token_embd.weight - [ 4096, 128256, 1, 1], type = f16, converting to q4_K .. size = 1002.00 MiB -> 281.81 MiB\n", "[ 2/ 291] blk.0.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 3/ 291] blk.0.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 4/ 291] blk.0.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 5/ 291] blk.0.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 6/ 291] blk.0.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 7/ 291] blk.0.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 8/ 291] blk.0.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 9/ 291] blk.0.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 10/ 291] blk.0.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 11/ 291] blk.1.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 12/ 291] blk.1.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 13/ 291] blk.1.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 14/ 291] blk.1.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 15/ 291] blk.1.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 16/ 291] blk.1.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 17/ 291] blk.1.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 18/ 291] blk.1.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 19/ 291] blk.1.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 20/ 291] blk.2.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 21/ 291] blk.2.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 22/ 291] blk.2.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 23/ 291] blk.2.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 24/ 291] blk.2.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 25/ 291] blk.2.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 26/ 291] blk.2.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 27/ 291] blk.2.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 28/ 291] blk.2.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 29/ 291] blk.3.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 30/ 291] blk.3.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 31/ 291] blk.3.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 32/ 291] blk.3.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 33/ 291] blk.3.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 34/ 291] blk.3.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 35/ 291] blk.3.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 36/ 291] blk.3.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 37/ 291] blk.3.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 38/ 291] blk.4.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 39/ 291] blk.4.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 40/ 291] blk.4.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 41/ 291] blk.4.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 42/ 291] blk.4.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 43/ 291] blk.4.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 44/ 291] blk.4.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 45/ 291] blk.4.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 46/ 291] blk.4.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 47/ 291] blk.5.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 48/ 291] blk.5.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 49/ 291] blk.5.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 50/ 291] blk.5.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 51/ 291] blk.5.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 52/ 291] blk.5.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 53/ 291] blk.5.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 54/ 291] blk.5.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 55/ 291] blk.5.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 56/ 291] blk.6.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 57/ 291] blk.6.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 58/ 291] blk.6.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 59/ 291] blk.6.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 60/ 291] blk.6.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 61/ 291] blk.6.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 62/ 291] blk.6.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 63/ 291] blk.6.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 64/ 291] blk.6.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 65/ 291] blk.7.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 66/ 291] blk.7.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 67/ 291] blk.7.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 68/ 291] blk.7.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 69/ 291] blk.7.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 70/ 291] blk.7.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 71/ 291] blk.7.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 72/ 291] blk.7.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 73/ 291] blk.7.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 74/ 291] blk.8.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 75/ 291] blk.8.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 76/ 291] blk.8.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 77/ 291] blk.8.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 78/ 291] blk.8.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 79/ 291] blk.8.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 80/ 291] blk.8.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 81/ 291] blk.8.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 82/ 291] blk.8.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 83/ 291] blk.9.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 84/ 291] blk.9.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 85/ 291] blk.9.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 86/ 291] blk.9.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 87/ 291] blk.9.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 88/ 291] blk.9.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 89/ 291] blk.9.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 90/ 291] blk.9.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 91/ 291] blk.9.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 92/ 291] blk.10.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 93/ 291] blk.10.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 94/ 291] blk.10.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 95/ 291] blk.10.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 96/ 291] blk.10.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 97/ 291] blk.10.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 98/ 291] blk.10.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 99/ 291] blk.10.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 100/ 291] blk.10.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 101/ 291] blk.11.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 102/ 291] blk.11.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 103/ 291] blk.11.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 104/ 291] blk.11.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 105/ 291] blk.11.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 106/ 291] blk.11.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 107/ 291] blk.11.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 108/ 291] blk.11.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 109/ 291] blk.11.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 110/ 291] blk.12.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 111/ 291] blk.12.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 112/ 291] blk.12.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 113/ 291] blk.12.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 114/ 291] blk.12.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 115/ 291] blk.12.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 116/ 291] blk.12.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 117/ 291] blk.12.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 118/ 291] blk.12.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 119/ 291] blk.13.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 120/ 291] blk.13.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 121/ 291] blk.13.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 122/ 291] blk.13.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 123/ 291] blk.13.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 124/ 291] blk.13.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 125/ 291] blk.13.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 126/ 291] blk.13.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 127/ 291] blk.13.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 128/ 291] blk.14.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 129/ 291] blk.14.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 130/ 291] blk.14.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 131/ 291] blk.14.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 132/ 291] blk.14.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 133/ 291] blk.14.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 134/ 291] blk.14.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 135/ 291] blk.14.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 136/ 291] blk.14.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 137/ 291] blk.15.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 138/ 291] blk.15.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 139/ 291] blk.15.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 140/ 291] blk.15.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 141/ 291] blk.15.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 142/ 291] blk.15.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 143/ 291] blk.15.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 144/ 291] blk.15.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 145/ 291] blk.15.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 146/ 291] blk.16.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 147/ 291] blk.16.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 148/ 291] blk.16.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 149/ 291] blk.16.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 150/ 291] blk.16.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 151/ 291] blk.16.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 152/ 291] blk.16.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 153/ 291] blk.16.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 154/ 291] blk.16.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 155/ 291] blk.17.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 156/ 291] blk.17.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 157/ 291] blk.17.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 158/ 291] blk.17.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 159/ 291] blk.17.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 160/ 291] blk.17.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 161/ 291] blk.17.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 162/ 291] blk.17.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 163/ 291] blk.17.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 164/ 291] blk.18.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 165/ 291] blk.18.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 166/ 291] blk.18.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 167/ 291] blk.18.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 168/ 291] blk.18.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 169/ 291] blk.18.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 170/ 291] blk.18.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 171/ 291] blk.18.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 172/ 291] blk.18.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 173/ 291] blk.19.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 174/ 291] blk.19.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 175/ 291] blk.19.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 176/ 291] blk.19.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 177/ 291] blk.19.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 178/ 291] blk.19.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 179/ 291] blk.19.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 180/ 291] blk.19.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 181/ 291] blk.19.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 182/ 291] blk.20.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 183/ 291] blk.20.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 184/ 291] blk.20.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 185/ 291] blk.20.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 186/ 291] blk.20.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 187/ 291] blk.20.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 188/ 291] blk.20.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 189/ 291] blk.20.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 190/ 291] blk.20.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 191/ 291] blk.21.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 192/ 291] blk.21.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 193/ 291] blk.21.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 194/ 291] blk.21.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 195/ 291] blk.21.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 196/ 291] blk.21.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 197/ 291] blk.21.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 198/ 291] blk.21.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 199/ 291] blk.21.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 200/ 291] blk.22.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 201/ 291] blk.22.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 202/ 291] blk.22.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 203/ 291] blk.22.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 204/ 291] blk.22.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 205/ 291] blk.22.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 206/ 291] blk.22.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 207/ 291] blk.22.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 208/ 291] blk.22.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 209/ 291] blk.23.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 210/ 291] blk.23.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 211/ 291] blk.23.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 212/ 291] blk.23.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 213/ 291] blk.23.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 214/ 291] blk.23.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 215/ 291] blk.23.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 216/ 291] blk.23.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 217/ 291] blk.23.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 218/ 291] blk.24.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 219/ 291] blk.24.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 220/ 291] blk.24.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 221/ 291] blk.24.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 222/ 291] blk.24.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 223/ 291] blk.24.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 224/ 291] blk.24.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 225/ 291] blk.24.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 226/ 291] blk.24.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 227/ 291] blk.25.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 228/ 291] blk.25.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 229/ 291] blk.25.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 230/ 291] blk.25.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 231/ 291] blk.25.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 232/ 291] blk.25.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 233/ 291] blk.25.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 234/ 291] blk.25.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 235/ 291] blk.25.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 236/ 291] blk.26.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 237/ 291] blk.26.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 238/ 291] blk.26.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 239/ 291] blk.26.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 240/ 291] blk.26.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 241/ 291] blk.26.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 242/ 291] blk.26.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 243/ 291] blk.26.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 244/ 291] blk.26.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 245/ 291] blk.27.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 246/ 291] blk.27.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 247/ 291] blk.27.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 248/ 291] blk.27.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 249/ 291] blk.27.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 250/ 291] blk.27.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 251/ 291] blk.27.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 252/ 291] blk.27.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 253/ 291] blk.27.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 254/ 291] blk.28.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 255/ 291] blk.28.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 256/ 291] blk.28.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 257/ 291] blk.28.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 258/ 291] blk.28.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 259/ 291] blk.28.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 260/ 291] blk.28.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 261/ 291] blk.28.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 262/ 291] blk.28.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 263/ 291] blk.29.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 264/ 291] blk.29.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 265/ 291] blk.29.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 266/ 291] blk.29.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 267/ 291] blk.29.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 268/ 291] blk.29.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 269/ 291] blk.29.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 270/ 291] blk.29.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 271/ 291] blk.29.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 272/ 291] blk.30.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 273/ 291] blk.30.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 274/ 291] blk.30.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 275/ 291] blk.30.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 276/ 291] blk.30.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 277/ 291] blk.30.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 278/ 291] blk.30.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 279/ 291] blk.30.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 280/ 291] blk.30.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 281/ 291] blk.31.attn_q.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 282/ 291] blk.31.attn_k.weight - [ 4096, 1024, 1, 1], type = f16, converting to q4_K .. size = 8.00 MiB -> 2.25 MiB\n", "[ 283/ 291] blk.31.attn_v.weight - [ 4096, 1024, 1, 1], type = f16, converting to q6_K .. size = 8.00 MiB -> 3.28 MiB\n", "[ 284/ 291] blk.31.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_K .. size = 32.00 MiB -> 9.00 MiB\n", "[ 285/ 291] blk.31.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 286/ 291] blk.31.ffn_up.weight - [ 4096, 14336, 1, 1], type = f16, converting to q4_K .. size = 112.00 MiB -> 31.50 MiB\n", "[ 287/ 291] blk.31.ffn_down.weight - [14336, 4096, 1, 1], type = f16, converting to q6_K .. size = 112.00 MiB -> 45.94 MiB\n", "[ 288/ 291] blk.31.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 289/ 291] blk.31.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 290/ 291] output_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB\n", "[ 291/ 291] output.weight - [ 4096, 128256, 1, 1], type = f16, converting to q6_K .. size = 1002.00 MiB -> 410.98 MiB\n", "llama_model_quantize_internal: model size = 15317.02 MB\n", "llama_model_quantize_internal: quant size = 4685.30 MB\n", "\n", "main: quantize time = 1009833.12 ms\n", "main: total time = 1009833.12 ms\n", "Unsloth: Conversion completed! Output location: ./Arun1982/LLama3-LoRA-unsloth.Q4_K_M.gguf\n", "Unsloth: Uploading GGUF to Huggingface Hub...\n" ] }, { "output_type": "display_data", "data": { "text/plain": [ "LLama3-LoRA-unsloth.Q4_K_M.gguf: 0%| | 0.00/4.92G [00:00