Loser Cheems
JingzeShi
AI & ML interests
I like training small languge models.
Recent Activity
updated
a model
about 9 hours ago
SmallDoge/Doge-160M-Reason-Distill
published
a model
about 9 hours ago
SmallDoge/Doge-160M-Reason-Distill
liked
a model
about 21 hours ago
SmallDoge/Doge-160M-Instruct
Organizations
JingzeShi's activity

reacted to
prithivMLmods's
post with ๐ฅ
9 days ago

reacted to
prithivMLmods's
post with ๐ค
9 days ago
Post
4197
QwQ Edge Gets a Small Update..! ๐ฌ
try now: prithivMLmods/QwQ-Edge
๐Now, you can use the following commands for different tasks:
๐ผ๏ธ @image 'prompt...' โ Generates an image
๐@tts1 'prompt...' โ Generates speech in a female voice
๐ @tts2 'prompt...' โ Generates speech in a male voice
๐ ฐ๏ธ@text 'prompt...' โ Enables textual conversation (If not specified, text-to-text generation is the default mode)
๐ฌMultimodality Support : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
๐ฌFor text generation, the FastThink-0.5B model ensures quick and efficient responses, prithivMLmods/FastThink-0.5B-Tiny
๐ฌImage Generation: sdxl lightning model, SG161222/RealVisXL_V4.0_Lightning
Github: https://github.com/PRITHIVSAKTHIUR/QwQ-Edge
try now: prithivMLmods/QwQ-Edge
๐Now, you can use the following commands for different tasks:
๐ผ๏ธ @image 'prompt...' โ Generates an image
๐@tts1 'prompt...' โ Generates speech in a female voice
๐ @tts2 'prompt...' โ Generates speech in a male voice
๐ ฐ๏ธ@text 'prompt...' โ Enables textual conversation (If not specified, text-to-text generation is the default mode)
๐ฌMultimodality Support : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
๐ฌFor text generation, the FastThink-0.5B model ensures quick and efficient responses, prithivMLmods/FastThink-0.5B-Tiny
๐ฌImage Generation: sdxl lightning model, SG161222/RealVisXL_V4.0_Lightning
Github: https://github.com/PRITHIVSAKTHIUR/QwQ-Edge
graph TD
A[User Interface] --> B[Chat Logic]
B --> C{Command Type}
C -->|Text| D[FastThink-0.5B]
C -->|Image| E[Qwen2-VL-OCR-2B]
C -->|@image| F[Stable Diffusion XL]
C -->|@tts| G[Edge TTS]
D --> H[Response]
E --> H
F --> H
G --> H
Post
2251
Welcome to the Doge Face Open Source Community! ๐
Our goal is to explore the foundation of embodied intelligence for the next two years, which is indispensable โ small language models. ๐ฌ
We aim to open-source code and documentation to give everyone more time to slack off while working or studying! ๐ค
๐ Repository name on Github: https://github.com/SmallDoges/small-doge
๐ Organization name on Hugging Face: https://huggingface.co./SmallDoge
Our goal is to explore the foundation of embodied intelligence for the next two years, which is indispensable โ small language models. ๐ฌ
We aim to open-source code and documentation to give everyone more time to slack off while working or studying! ๐ค
๐ Repository name on Github: https://github.com/SmallDoges/small-doge
๐ Organization name on Hugging Face: https://huggingface.co./SmallDoge

posted
an
update
17 days ago
Post
2251
Welcome to the Doge Face Open Source Community! ๐
Our goal is to explore the foundation of embodied intelligence for the next two years, which is indispensable โ small language models. ๐ฌ
We aim to open-source code and documentation to give everyone more time to slack off while working or studying! ๐ค
๐ Repository name on Github: https://github.com/SmallDoges/small-doge
๐ Organization name on Hugging Face: https://huggingface.co./SmallDoge
Our goal is to explore the foundation of embodied intelligence for the next two years, which is indispensable โ small language models. ๐ฌ
We aim to open-source code and documentation to give everyone more time to slack off while working or studying! ๐ค
๐ Repository name on Github: https://github.com/SmallDoges/small-doge
๐ Organization name on Hugging Face: https://huggingface.co./SmallDoge
Post
1698
๐คฉwarmup -> stable -> decay leanring rate scheduler:
๐use the Stable Phase CheckPoints to Continue Training the model on Any New Dataset without spikes of the training!!!
SmallDoge/Doge-20M-checkpoint
SmallDoge/Doge-60M-checkpoint
๐use the Stable Phase CheckPoints to Continue Training the model on Any New Dataset without spikes of the training!!!
SmallDoge/Doge-20M-checkpoint
SmallDoge/Doge-60M-checkpoint
Post
1698
๐คฉwarmup -> stable -> decay leanring rate scheduler:
๐use the Stable Phase CheckPoints to Continue Training the model on Any New Dataset without spikes of the training!!!
SmallDoge/Doge-20M-checkpoint
SmallDoge/Doge-60M-checkpoint
๐use the Stable Phase CheckPoints to Continue Training the model on Any New Dataset without spikes of the training!!!
SmallDoge/Doge-20M-checkpoint
SmallDoge/Doge-60M-checkpoint

replied to
their
post
29 days ago
but you are internet celebrity rapper of๐

replied to
their
post
29 days ago
the process is always hard, the result is always good.๐
Post
1698
๐คฉwarmup -> stable -> decay leanring rate scheduler:
๐use the Stable Phase CheckPoints to Continue Training the model on Any New Dataset without spikes of the training!!!
SmallDoge/Doge-20M-checkpoint
SmallDoge/Doge-60M-checkpoint
๐use the Stable Phase CheckPoints to Continue Training the model on Any New Dataset without spikes of the training!!!
SmallDoge/Doge-20M-checkpoint
SmallDoge/Doge-60M-checkpoint

posted
an
update
29 days ago
Post
1698
๐คฉwarmup -> stable -> decay leanring rate scheduler:
๐use the Stable Phase CheckPoints to Continue Training the model on Any New Dataset without spikes of the training!!!
SmallDoge/Doge-20M-checkpoint
SmallDoge/Doge-60M-checkpoint
๐use the Stable Phase CheckPoints to Continue Training the model on Any New Dataset without spikes of the training!!!
SmallDoge/Doge-20M-checkpoint
SmallDoge/Doge-60M-checkpoint
Post
2076
Only a single RTX 4090 running model pre-training is really slow, even for small language models!!! (https://huggingface.co./collections/JingzeShi/doge-slm-677fd879f8c4fd0f43e05458)

reacted to
anakin87's
post with ๐
30 days ago
Post
1625
๐๐๐ฐ ๐๐ญ๐๐ฅ๐ข๐๐ง ๐๐ฆ๐๐ฅ๐ฅ ๐๐๐ง๐ ๐ฎ๐๐ ๐ ๐๐จ๐๐๐ฅ๐ฌ: ๐๐๐ฆ๐ฆ๐ ๐๐๐จ๐ ๐๐ง๐๐ฌ๐ข๐ฌ ๐๐จ๐ฅ๐ฅ๐๐๐ญ๐ข๐จ๐ง ๐๐๐ฎ๐น
I am happy to release two new language models for the Italian Language!
๐ช Gemma 2 9B Neogenesis ITA
anakin87/gemma-2-9b-neogenesis-ita
Building on the impressive work by VAGO Solutions, I applied Direct Preference Optimization with a mix of Italian and English data.
Using Spectrum, I trained 20% of model layers.
๐ Evaluated on the Open ITA LLM leaderboard ( mii-llm/open_ita_llm_leaderboard), this model achieves strong performance.
To beat it on this benchmark, you'd need a 27B model ๐
๐ค Gemma 2 2B Neogenesis ITA
anakin87/gemma-2-2b-neogenesis-ita
This smaller variant is fine-tuned from the original Gemma 2 2B it by Google.
Through a combination of Supervised Fine-Tuning and Direct Preference Optimization, I trained 25% of the layers using Spectrum.
๐ Compared to the original model, it shows improved Italian proficiency, good for its small size.
Both models were developed during the recent #gemma competition on Kaggle.
๐ Training code: https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond
๐ Thanks @FinancialSupport and mii-llm for the help during evaluation.
I am happy to release two new language models for the Italian Language!
๐ช Gemma 2 9B Neogenesis ITA
anakin87/gemma-2-9b-neogenesis-ita
Building on the impressive work by VAGO Solutions, I applied Direct Preference Optimization with a mix of Italian and English data.
Using Spectrum, I trained 20% of model layers.
๐ Evaluated on the Open ITA LLM leaderboard ( mii-llm/open_ita_llm_leaderboard), this model achieves strong performance.
To beat it on this benchmark, you'd need a 27B model ๐
๐ค Gemma 2 2B Neogenesis ITA
anakin87/gemma-2-2b-neogenesis-ita
This smaller variant is fine-tuned from the original Gemma 2 2B it by Google.
Through a combination of Supervised Fine-Tuning and Direct Preference Optimization, I trained 25% of the layers using Spectrum.
๐ Compared to the original model, it shows improved Italian proficiency, good for its small size.
Both models were developed during the recent #gemma competition on Kaggle.
๐ Training code: https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond
๐ Thanks @FinancialSupport and mii-llm for the help during evaluation.
Cool!!!

replied to
their
post
30 days ago
So slow
Post
2076
Only a single RTX 4090 running model pre-training is really slow, even for small language models!!! (https://huggingface.co./collections/JingzeShi/doge-slm-677fd879f8c4fd0f43e05458)
Post
2076
Only a single RTX 4090 running model pre-training is really slow, even for small language models!!! (https://huggingface.co./collections/JingzeShi/doge-slm-677fd879f8c4fd0f43e05458)

posted
an
update
about 1 month ago
Post
2076
Only a single RTX 4090 running model pre-training is really slow, even for small language models!!! (https://huggingface.co./collections/JingzeShi/doge-slm-677fd879f8c4fd0f43e05458)