115 166 797

Prithiv Sakthi PRO

prithivMLmods

https://huggingface.co./strangerzonehf

AI & ML interests

computer vision, multimodality, realism engine adapters @starngerzonehf

Recent Activity

new activity about 4 hours ago

prithivMLmods/Calcium-Opus-14B-Elite2-R1:Adding Evaluation Results

updated a model about 4 hours ago

prithivMLmods/Calcium-Opus-14B-Elite2-R1

updated a model about 20 hours ago

prithivMLmods/LatexMind-2B-Codec

View all activity

Articles

Unlocking Creativity with Text-to-Image Generation: Exploring LoRA Models and Styles [Generative Vision]

Aug 8, 2024

• 14

Organizations

prithivMLmods's activity

posted an update about 21 hours ago

Post

1194

o3-Mini and Deepseek R1
Worked out with some famous and weird examples.

🔥Blog: https://huggingface.co./blog/prithivMLmods/o3-mini-vs-deepseek-r1

Prompt : Using HTML, CSS, and JavaScript in a single HTML file to create a simulation of the solar system. Pay extreme attention to the UI to make it as intuitive as possible. Ensure that every planet appears as a sphere and is labeled with its corresponding name.

example 1: o3 Mini , example 2: Deepseek R1

Q2 : https://huggingface.co./blog/prithivMLmods/o3-mini-vs-deepseek-r1#q2--web-solar-system-explorer

reacted to jasoncorkill's post with 🚀 3 days ago

Post

2570

We benchmarked @xai-org 's Aurora model, as far as we know the first public evaluation of the model at scale.

We collected 401k human annotations in over the past ~2 days for this, we have uploaded all of the annotation data here on huggingface with a fully permissive license
Rapidata/xAI_Aurora_t2i_human_preferences

reacted to mkurman's post with ❤️ 3 days ago

Post

1901

Blurred-Thoughts Supervised Fine-Tuning (BT-SFT) 🤖

Can we teach a model to think completely on its own without reinforcement learning? Actually, yes.

We can do straightforward supervised fine-tuning using a relatively simple trick: blurring a part of CoT thoughts. But why is this effective?

We observed that various models differ in their thinking processes, and fine-tuning one model on another model’s thoughts (CoT) can sometimes be inefficient—often resulting in the model simply memorizing reasoning rather than learning how to actually think.

I discovered that this process can still be efficient if we clearly indicate when the model should start and stop thinking and uncover only a part of CoT and the expected answer, blurring the other part of CoT. This approach allows the model to learn only a portion of the thought process while still arriving at an expected answer.

To demonstrate this, you can watch my experimental BT-SFT on meditsolutions/Llama-3.2-SUN-2.5B-chat model, which was fine-tuned on 151 million tokens from the Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B dataset.

Enjoy! 🚀

PS. If you were curious enough to read this, leave me a comment. It's always nice to chat with open-minded and intelligent ppl.

3 replies

reacted to not-lain's post with 🤗 4 days ago

Post

2537

I have just released a new blogpost about kv caching and its role in inference speedup 🚀
🔗 https://huggingface.co./blog/not-lain/kv-caching/
some takeaways :

4 replies

reacted to hexgrad's post with 🚀🚀🚀🚀 4 days ago

Post

7399

hexgrad/Kokoro-82M got an upgrade! ⬆️ More voices, more languages, pip install kokoro, and still 82M parameters.

GitHub: https://github.com/hexgrad/kokoro
PyPI: https://pypi.org/project/kokoro/
Space: hexgrad/Kokoro-TTS

10 replies

replied to their post 5 days ago

I’m not sure, but if context continues to evolve in the future, the real OpenAI will improve more and more compared to paid AI providers. People will increasingly use open-source models for their domain-specific tasks.

posted an update 5 days ago

Post

4721

Deepswipe by
.
.
.
. Deepseek🐬🗿

Everything is now in recovery. 📉📈

4 replies

reacted to AdinaY's post with 🔥 5 days ago

Post

2932

It’s not just a flood of model releases, papers are dropping just as fast 🚀

Here are the 10 most upvoted papers from the Chinese community:
👉 zh-ai-community/2025-january-papers-679933cbf0f3ced11f5a168a

reacted to m-ric's post with 🔥 6 days ago

Post

3304

𝗧𝗵𝗲 𝗛𝘂𝗯 𝘄𝗲𝗹𝗰𝗼𝗺𝗲𝘀 𝗲𝘅𝘁𝗲𝗿𝗻𝗮𝗹 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗽𝗿𝗼𝘃𝗶𝗱𝗲𝗿𝘀!

✅ Hosting our own inference was not enough: now the Hub 4 new inference providers: fal, Replicate, SambaNova Systems, & Together AI.

Check model cards on the Hub: you can now, in 1 click, use inference from various providers (cf video demo)

Their inference can also be used through our Inference API client. There, you can use either your custom provider key, or your HF token, then billing will be handled directly on your HF account, as a way to centralize all expenses.

💸 Also, PRO users get 2$ inference credits per month!

Read more in the announcement 👉 https://huggingface.co./blog/inference-providers

1 reply

replied to fdaudens's post 6 days ago

Incredible stat 📈

reacted to victor's post with 🚀 6 days ago

Post

2846

Finally, an open-source AI that turns your lyrics into full songs is here—meet YuE! Unlike other tools that only create short clips, YuE can make entire songs (up to 5 minutes) with vocals, melody, and instruments all working together. Letsss go!

m-a-p/YuE-s1-7B-anneal-en-cot

1 reply

reacted to clem's post with ❤️ 7 days ago

Post

6774

AI is not a zero-sum game. Open-source AI is the tide that lifts all boats!

reacted to fdaudens's post with ❤️ 7 days ago

Post

7838

Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

- Original release: 8 models, 540K downloads. Just the beginning...

- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals.

The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.

When you empower builders, innovation explodes. For everyone. 🚀

The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.

4 replies

reacted to nicolay-r's post with 🔥 7 days ago

Post

1748

📢 For those who wish to apply DeepSeek-R1 for handling tabular / streaming data using schema of prompts (CoT), the OpenRouter AI hosts API for accessing:
https://openrouter.ai/deepseek/deepseek-r1

The no-string option to quick start with using DeepSeek-R1 includes three steps:
✅ OpenRouter provider: https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/open_router.py
✅ Bulk-chain for infering data: https://github.com/nicolay-r/bulk-chain
✅ Json Schema for Chain-of-Though reasoning (see screenshot 📷 below)

📺 below is a screenshot of how to quick start the demo, in which you can test your schema for LLM responses. It would ask to type all the parameters first for completing the requests (which is text within this example).

📃 To apply it for JSONL/CSV data, you can use --src shell parameter for passing the related file

⏳ As for time, OpenRouter finds me relatively slow with 30~40 seconds per request

Models:
deepseek-ai/DeepSeek-R1

1 reply

reacted to AdinaY's post with 🔥 9 days ago

Post

1345

Baichuan is making big moves today 🔥

✨ Launched All-Scenario Reasoning Model (language, visual, and search reasoning capabilities) , with medical expertise as one of its key highlights.
https://ying.baichuan-ai.com/chat

✨ Released Baichuan-M1-14B Medical LLM on the hub
Available in both Base and Instruct versions, support English & Chinese.

Model:
baichuan-inc/Baichuan-M1-14B-Base
baichuan-inc/Baichuan-M1-14B-Instruct

reacted to burtenshaw's post with 🤯 10 days ago

Post

2236

AI was built on side projects!

reacted to AdinaY's post with 🔥 11 days ago

Post

1410

VideoLLaMA 3🔥multimodal foundation models for Image and Video Understanding by DAMO Alibaba

Model: DAMO-NLP-SG/videollama3-678cdda9281a0e32fe79af15
Paper: VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding (2501.13106)

✨ 2B/7B
✨ Apache2.0

1 reply

Prithiv Sakthi PRO

AI & ML interests

Recent Activity

Articles

o3-mini vs Deepseek-R1

Why PyThagorean? 🐍

N-Queens Problem Based Monte Carlo Algorithm

Fine-tune SmolLM's on custom synthetic data

GRID-6X : Layout for Seamless Image Assembly

Flux1.1 [pro] Ultra : Endpoint by BFL ⛵

Create Dynamic Typed Videos with 'Type Byte🐧'

Unlocking Creativity with Text-to-Image Generation: Exploring LoRA Models and Styles [Generative Vision]

Organizations

prithivMLmods's activity