Nicolay Rusnachenko

nicolay-r

https://nicolay-r.github.io/

AI & ML interests

Information Retrieval・Medical Multimodal NLP (🖼+📝) Research Fellow @BU_Research・software developer http://arekit.io・PhD in NLP

Recent Activity

liked a dataset about 20 hours ago

open-thoughts/OpenThoughts-114k

reacted to ggbetz's post with 👀 about 20 hours ago

We've just released syncIALO -- a multi-purpose synthetic debate and argument mapping corpus with more than 600k arguments: 📝 Blog article: https://huggingface.co./blog/ggbetz/introducing-syncialo 🛢️ Dataset: https://huggingface.co./datasets/DebateLabKIT/syncialo-raw 👩‍💻 Code: https://github.com/debatelab/syncIALO 🤗 Hugging Face has sponsored the syncIALO project through inference time / compute credits. 🙏 We gratefully acknowledge the generous support. 🫶

replied to their post about 22 hours ago

📢 Qwen so far released the 2.5-MAX that claims to outperform DeepSeek-V3 [Edited: not R1]. And here is how you can start applying it for handling CSV / JSONL data. The model is compatible with OpenAI API so here is my wrapper for it: 🌌 https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/openai_156.py 🚀 All you have to do is to set base-url: https://dashscope-intl.aliyuncs.com/compatible-mode/v1 and API key of the platform. ↗️ Below is the link to the complete example (see screenshot): https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_qwen_25_max_chat.sh 📰 Source: https://www.alibabacloud.com/help/en/model-studio/developer-reference/what-is-qwen-llm 📺 Official Sandbox Demo: https://huggingface.co./spaces/Qwen/Qwen2.5-Max-Demo 📜 Paper: https://arxiv.org/abs/2412.15115

View all activity

Organizations

None yet

nicolay-r's activity

reacted to ggbetz's post with 👀 about 20 hours ago

Post

888

We've just released syncIALO -- a multi-purpose synthetic debate and argument mapping corpus with more than 600k arguments:

📝 Blog article: https://huggingface.co./blog/ggbetz/introducing-syncialo
🛢️ Dataset: DebateLabKIT/syncialo-raw
👩‍💻 Code: https://github.com/debatelab/syncIALO

🤗 Hugging Face has sponsored the syncIALO project through inference time / compute credits. 🙏 We gratefully acknowledge the generous support. 🫶

replied to their post about 22 hours ago

@claudiohgdotta , thanks edited!
That would be too much from the Qwen-2.5-MAX.
Especially counting on how fast the demo inference of the Qwen.

posted an update 2 days ago

Post

1882

📢 Qwen so far released the 2.5-MAX that claims to outperform DeepSeek-V3 [Edited: not R1].
And here is how you can start applying it for handling CSV / JSONL data.
The model is compatible with OpenAI API so here is my wrapper for it:
🌌 https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/openai_156.py

🚀 All you have to do is to set
base-url: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
and API key of the platform.

↗️ Below is the link to the complete example (see screenshot):
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_qwen_25_max_chat.sh

📰 Source: https://www.alibabacloud.com/help/en/model-studio/developer-reference/what-is-qwen-llm
📺 Official Sandbox Demo: Qwen/Qwen2.5-Max-Demo
📜 Paper: https://arxiv.org/abs/2412.15115

2 replies

reacted to singhsidhukuldeep's post with 🚀 2 days ago

Post

3156

Exciting Research Alert: Revolutionizing Complex Information Retrieval!

A groundbreaking paper from researchers at MIT, AWS AI, and UPenn introduces ARM (Alignment-Oriented LLM-based Retrieval Method), a novel approach to tackle complex information retrieval challenges.

>> Key Innovations

Information Alignment
The method first decomposes queries into keywords and aligns them with available data using both BM25 and embedding similarity, ensuring comprehensive coverage of information needs.

Structure Alignment
ARM employs a sophisticated mixed-integer programming solver to identify connections between data objects, exploring relationships beyond simple semantic matching.

Self-Verification
The system includes a unique self-verification mechanism where the LLM evaluates and aggregates results from multiple retrieval paths, ensuring accuracy and completeness.

>> Performance Highlights

The results are impressive:
- Outperforms standard RAG by up to 5.2 points in execution accuracy on Bird dataset
- Achieves 19.3 points higher F1 scores compared to existing approaches on OTT-QA
- Reduces the number of required LLM calls while maintaining superior retrieval quality

>> Technical Implementation

The system uses a three-step process:
1. N-gram indexing and embedding computation for all data objects
2. Constrained beam decoding for information alignment
3. Mixed-integer programming optimization for structure exploration

This research represents a significant step forward in making complex information retrieval more efficient and accurate. The team's work demonstrates how combining traditional optimization techniques with modern LLM capabilities can solve challenging retrieval problems.

reacted to JingzeShi's post with 🤗 2 days ago

Post

2130

Welcome to the Doge Face Open Source Community! 🚀
Our goal is to explore the foundation of embodied intelligence for the next two years, which is indispensable – small language models. 🔬
We aim to open-source code and documentation to give everyone more time to slack off while working or studying! 🤗
👉 Repository name on Github: https://github.com/SmallDoges/small-doge
👉 Organization name on Hugging Face: https://huggingface.co./SmallDoge

reacted to csabakecskemeti's post with 👀 2 days ago

Post

1701

Check out my idea:
LLmaaS - Local LLM as a Service

With LLmaaS, I propose leveraging locally running LLMs as a service, providing a standardized way for websites to access and utilize them for LLM-powered operations directly on the user’s device.

Demo, code, more detailed description.
https://devquasar.com/llmaas/
https://github.com/csabakecskemeti/LLmaaS
https://youtu.be/OOWGr8jcP5Q

Call for contributors
Join me a develop the LLmaaS proxy to make this a generic purpose tool to leverage local LLMs on web. Build in security measures.
I'm looking for help to make the proxy more generic support multiple local LLM services without any change on the HTML side.
Also looking for ideas how to make the HTML par more modular and easy to use.

4 replies

reacted to fdaudens's post with 👀 2 days ago

Post

2174

📊 R1 just built its own download dashboard!

Some fresh stats: +6M downloads for 800+ derivative models vs 2M for originals. Watch the numbers grow here: fdaudens/deepseek-download-stats

reacted to Pendrokar's post with ❤️ 2 days ago

Post

2783

TTS: Added Kokoro v1, Parler Large, LlaSa 3B & MARS 6 TTS models to the Arena.
Pendrokar/TTS-Spaces-Arena

Also had added MaskGCT, GPT-SoVITS & OuteTTS a month ago. OuteTTS devs did say that is too early for it to be added to TTS Arenas.

Mars 5 does have a space with open weights models, but inference is way too slow (2 minutes+).

2 replies

reacted to chansung's post with 👍 2 days ago

Post

2559

Simple Paper Review #5

I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from HKU, UC Berkeley, Google DeepMind, and New York University

The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error.

The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization.

I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.

https://arxiv.org/abs/2501.17161

posted an update 3 days ago

Post

1806

📢 If you're looking forward to reviewing ✍️ the @deepseek_ai R1 model in your studies 📰, this cited post below will be helpful. It breaks down 🔨 all the key concepts within just a single paragraph.

📜 Original paper: https://arxiv.org/abs/2501.12948

1 reply

reacted to singhsidhukuldeep's post with 🔥 3 days ago

Post

2308

Excited to share groundbreaking research in Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG)!

Researchers from the University of Science and Technology of China have developed FRAG - a novel flexible modular framework that revolutionizes how Large Language Models (LLMs) reason with knowledge graphs.

What makes FRAG special? It intelligently adapts retrieval strategies based on query complexity without requiring expensive KG fine-tuning. The framework uses a reasoning-aware module to classify queries as simple or complex, then applies tailored retrieval pipelines.

Under the hood:
- For simple queries: Uses breadth-first search and ranking to efficiently find relevant paths
- For complex queries: Employs shortest path algorithms to minimize computational overhead
- Features a preprocessing-retrieval-postprocessing pipeline with flexible components
- Leverages traditional algorithms like PersonalizedPageRank for subgraph extraction
- Implements edge and path ranking models for precise information filtering

The results are impressive - FRAG achieves state-of-the-art performance while maintaining high efficiency and low resource consumption. On benchmark datasets like WebQSP and CWQ, it outperforms existing approaches by significant margins.

Most importantly, FRAG maintains flexibility and modularity while improving retrieval quality - no expensive LLM fine-tuning required! This makes it highly practical for real-world applications.

This work represents a major step forward in making LLMs more reliable and capable of complex reasoning tasks. Looking forward to seeing how this technology evolves!

2 replies

reacted to prithivMLmods's post with 👀 3 days ago

Post

4303

o3-Mini and Deepseek R1
Worked out with some famous and weird examples.

🔥Blog: https://huggingface.co./blog/prithivMLmods/o3-mini-vs-deepseek-r1

Prompt : Using HTML, CSS, and JavaScript in a single HTML file to create a simulation of the solar system. Pay extreme attention to the UI to make it as intuitive as possible. Ensure that every planet appears as a sphere and is labeled with its corresponding name.

example 1: o3 Mini , example 2: Deepseek R1

Q2 : https://huggingface.co./blog/prithivMLmods/o3-mini-vs-deepseek-r1#q2--web-solar-system-explorer

reacted to ezgikorkmaz's post with 👍 3 days ago

Post

2021

If you are interested in deep reinforcement learning now you can register for the AAAI 2025 Tutorial I am organizing!

Link: https://sites.google.com/view/aisafety-aaai2025

reacted to as-cle-bert's post with 👍 4 days ago

Post

893

Hi HuggingFace community!🤗

I just published an article in which I try to articulate some counter-points to Dario Amodei's post "On DeepSeek and Export Control"👉 https://huggingface.co./blog/as-cle-bert/why-we-dont-need-export-control

I try to address several key passages of the third section from Amodei's post (https://darioamodei.com/on-deepseek-and-export-controls), bringing my perspective on the importance of open source, open knowledge and multipolarity in a crucial field for our future such as Artificial Intelligence.

Happy reading!✨

reacted to MonsterMMORPG's post with 👀 4 days ago

Post

1702

Paints-UNDO Installers Published - Undo Images Like Drawing From Scratch - 1-Click Install for Windows, RunPod, Massed Compute, Kaggle

Installers shared here : https://www.patreon.com/posts/121228327

Check attached images

PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings

So what this APP do is that, you give an input image, it tries to mimic how that image could have been drawn with an artist like steps of drawing it

The APP generates image steps points in time and also a video output of drawing like drawing from scratch

Official Repo : https://github.com/lllyasviel/Paints-UNDO

We have Low VRAM mode and it works great

1-Click Installers and Gradio APP : https://www.patreon.com/posts/121228327

We have 1-Click installer for Windows, RunPod, Massed Compute and a Free Kaggle Notebook. Read this post extremely carefully to learn how to use all

reacted to kristaller486's post with 🚀 4 days ago

Post

1281

Nebo-T1-Russian

(Probably) the first "longCoT" dataset for the Russian language created via Deeseek-R1.

- Prompts taken from the Sky-T1 dataset and translated via Llama3.3-70B.
- Answers and reasoning generated by Deepseek-R1 (685B).
- 16.4K samples in total, ≈12.4K Russian-only (in the rest, either the answer or reasoning is in English).
- Languages in the answers and reasoning are labeled using fasttext.

kristaller486/Nebo-T1-Russian

reacted to rubenroy's post with 🔥 4 days ago

Post

2500

🎉 Fully released my newest models trained on my GammaCorpus dataset, Zurich 7B & 14B and Geneva 12B. Here is the model collections:

Zurich:
rubenroy/zurich-679b21284e207e2844bc025d

Geneva:
https://huggingface.co./collections/rubenroy/geneva-679e33a55d1576319b0d9cd4

If you would like to test them, feel free to visit their spaces:
rubenroy/Geneva-12B
rubenroy/Zurich-14B
rubenroy/Zurich-7B

reacted to chansung's post with 👍 4 days ago

Post

3913

A brief summary of the o3-mini

The OpenAI o3-mini model is a significant improvement over the o1-mini, reaching o1 performance levels. While generally good, its performance isn't universally better than previous models (o1, o1-prev.) or GPT-4o across all benchmarks. This means workflows should be re-evaluated with each model upgrade.

The o3-mini has "low," "medium," and "high" versions, with "low" being the base model used for benchmarking. It's speculated that the higher versions simply involve more processing. A fair comparison with other models like Gemini 2.0 Thinking or DeepSeek-R1 would likely need to use the "low" version and a similar "think more" mechanism.

The system card is recommended reading due to its comprehensive benchmark data.

https://openai.com/index/openai-o3-mini/

posted an update 4 days ago

Post

1559

📢 The LLaMA-3.1-8B distilled 8B version of the R1 DeepSeek AI is available besides the one based on Qwen

📙 Notebook for using it in reasoning over series of data 🧠 :
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_llama3.ipynb

Loading using the pipeline API of the transformers library:
https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_llama.py
🟡 GPU Usage: 12.3 GB (FP16/FP32 mode) which is suitable for T4. (a 1.5 GB less than Qwen-distilled version)
🐌 Perfomance: T4 instance: ~0.19 tokens/sec (FP32 mode) and (FP16 mode) ~0.22-0.30 tokens/sec. Is it should be that slow? 🤔
Model name: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
⭐ Framework: https://github.com/nicolay-r/bulk-chain
🌌 Notebooks and models hub: https://github.com/nicolay-r/nlp-thirdgate

posted an update 5 days ago

Post

1282

🚨 MistralAI is back with the mistral small V3 model update and it is free! 👏
https://docs.mistral.ai/getting-started/models/models_overview/#free-models

🚀 Below is the the provider for reasoning over your dataset rows with custom schema 🧠
https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/mistralai_150.py

My personal usage experience and findings:
⚠️The original API usage may constanly fail with the connection.
To bypass this limitation, use --attempts [COUNT] to withstand connection loss while iterating through JSONL/CSV data (see 📷 below)

💵 It is actually: ~0.18 USD 1M tokens
🌟 Framework: https://github.com/nicolay-r/bulk-chain