Mxode
/

NanoLM-0.3B-Instruct-v1

Text2Text Generation

Model card Files Files and versions Community

Mxode commited on Sep 3, 2024

Commit

1dc5130

·

verified ·

1 Parent(s): 9172055

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ This is NanoLM-0.3B-Instruct-v1, the first version of NanoLM-0.3B-Instruct. The
 ## Model Details
-The tokenizer and model architecture of NanoLM-0.3B-Instruct-v1 are the same as [Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B), but the number of layers has been reduced from 24 to 12. As a result, NanoLM-0.3B-Instruct-v1 has only 0.3 billion parameters, with approximately 180 million non-embedding parameters. Despite this, NanoLM-0.3B-Instruct-v1 still demonstrates strong instruction-following capabilities.
 Here are some examples. For reproducibility purposes, I've set `do_sample` to `False`. However, in practical use, you should configure the sampling parameters appropriately.

 ## Model Details
+The tokenizer and model architecture of NanoLM-0.3B-Instruct-v1 are the same as [Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B), but the number of layers has been reduced from 24 to 12. As a result, NanoLM-0.3B-Instruct-v1 has only 0.3 billion parameters, with approximately **180 million non-embedding parameters**. Despite this, NanoLM-0.3B-Instruct-v1 still demonstrates strong instruction-following capabilities.
 Here are some examples. For reproducibility purposes, I've set `do_sample` to `False`. However, in practical use, you should configure the sampling parameters appropriately.