Nicolay Rusnachenko's picture

Nicolay Rusnachenko

nicolay-r

AI & ML interests

Information Retrieval・Medical Multimodal NLP (πŸ–Ό+πŸ“) Research Fellow @BU_Research・software developer http://arekit.io・PhD in NLP

Recent Activity

liked a dataset about 20 hours ago
open-thoughts/OpenThoughts-114k
View all activity

Organizations

None yet

nicolay-r's activity

reacted to ggbetz's post with πŸ‘€ about 20 hours ago
view post
Post
888
We've just released syncIALO -- a multi-purpose synthetic debate and argument mapping corpus with more than 600k arguments:

πŸ“ Blog article: https://huggingface.co./blog/ggbetz/introducing-syncialo
πŸ›’οΈ Dataset: DebateLabKIT/syncialo-raw
πŸ‘©β€πŸ’» Code: https://github.com/debatelab/syncIALO

πŸ€— Hugging Face has sponsored the syncIALO project through inference time / compute credits. πŸ™ We gratefully acknowledge the generous support. 🫢
replied to their post about 22 hours ago
view reply

@claudiohgdotta , thanks edited!
That would be too much from the Qwen-2.5-MAX.
Especially counting on how fast the demo inference of the Qwen.

posted an update 2 days ago
view post
Post
1882
πŸ“’ Qwen so far released the 2.5-MAX that claims to outperform DeepSeek-V3 [Edited: not R1].
And here is how you can start applying it for handling CSV / JSONL data.
The model is compatible with OpenAI API so here is my wrapper for it:
🌌 https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/openai_156.py

πŸš€ All you have to do is to set
base-url: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
and API key of the platform.

↗️ Below is the link to the complete example (see screenshot):
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_qwen_25_max_chat.sh

πŸ“° Source: https://www.alibabacloud.com/help/en/model-studio/developer-reference/what-is-qwen-llm
πŸ“Ί Official Sandbox Demo: Qwen/Qwen2.5-Max-Demo
πŸ“œ Paper: https://arxiv.org/abs/2412.15115
  • 2 replies
Β·
reacted to singhsidhukuldeep's post with πŸš€ 2 days ago
view post
Post
3156
Exciting Research Alert: Revolutionizing Complex Information Retrieval!

A groundbreaking paper from researchers at MIT, AWS AI, and UPenn introduces ARM (Alignment-Oriented LLM-based Retrieval Method), a novel approach to tackle complex information retrieval challenges.

>> Key Innovations

Information Alignment
The method first decomposes queries into keywords and aligns them with available data using both BM25 and embedding similarity, ensuring comprehensive coverage of information needs.

Structure Alignment
ARM employs a sophisticated mixed-integer programming solver to identify connections between data objects, exploring relationships beyond simple semantic matching.

Self-Verification
The system includes a unique self-verification mechanism where the LLM evaluates and aggregates results from multiple retrieval paths, ensuring accuracy and completeness.

>> Performance Highlights

The results are impressive:
- Outperforms standard RAG by up to 5.2 points in execution accuracy on Bird dataset
- Achieves 19.3 points higher F1 scores compared to existing approaches on OTT-QA
- Reduces the number of required LLM calls while maintaining superior retrieval quality

>> Technical Implementation

The system uses a three-step process:
1. N-gram indexing and embedding computation for all data objects
2. Constrained beam decoding for information alignment
3. Mixed-integer programming optimization for structure exploration

This research represents a significant step forward in making complex information retrieval more efficient and accurate. The team's work demonstrates how combining traditional optimization techniques with modern LLM capabilities can solve challenging retrieval problems.
reacted to JingzeShi's post with πŸ€— 2 days ago
view post
Post
2130
Welcome to the Doge Face Open Source Community! πŸš€
Our goal is to explore the foundation of embodied intelligence for the next two years, which is indispensable – small language models. πŸ”¬
We aim to open-source code and documentation to give everyone more time to slack off while working or studying! πŸ€—
πŸ‘‰ Repository name on Github: https://github.com/SmallDoges/small-doge
πŸ‘‰ Organization name on Hugging Face: https://huggingface.co./SmallDoge
reacted to csabakecskemeti's post with πŸ‘€ 2 days ago
view post
Post
1701
Check out my idea:
LLmaaS - Local LLM as a Service

With LLmaaS, I propose leveraging locally running LLMs as a service, providing a standardized way for websites to access and utilize them for LLM-powered operations directly on the user’s device.

Demo, code, more detailed description.
https://devquasar.com/llmaas/
https://github.com/csabakecskemeti/LLmaaS
https://youtu.be/OOWGr8jcP5Q

Call for contributors
Join me a develop the LLmaaS proxy to make this a generic purpose tool to leverage local LLMs on web. Build in security measures.
I'm looking for help to make the proxy more generic support multiple local LLM services without any change on the HTML side.
Also looking for ideas how to make the HTML par more modular and easy to use.
  • 4 replies
Β·
reacted to fdaudens's post with πŸ‘€ 2 days ago
view post
Post
2174
πŸ“Š R1 just built its own download dashboard!

Some fresh stats: +6M downloads for 800+ derivative models vs 2M for originals. Watch the numbers grow here: fdaudens/deepseek-download-stats
reacted to Pendrokar's post with ❀️ 2 days ago
view post
Post
2783
TTS: Added Kokoro v1, Parler Large, LlaSa 3B & MARS 6 TTS models to the Arena.
Pendrokar/TTS-Spaces-Arena

Also had added MaskGCT, GPT-SoVITS & OuteTTS a month ago. OuteTTS devs did say that is too early for it to be added to TTS Arenas.

Mars 5 does have a space with open weights models, but inference is way too slow (2 minutes+).
  • 2 replies
Β·
reacted to chansung's post with πŸ‘ 2 days ago
view post
Post
2559
Simple Paper Review #5

I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from HKU, UC Berkeley, Google DeepMind, and New York University

The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error.

The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization.

I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.

https://arxiv.org/abs/2501.17161
posted an update 3 days ago
view post
Post
1806
πŸ“’ If you're looking forward to reviewing ✍️ the @deepseek_ai R1 model in your studies πŸ“°, this cited post below will be helpful. It breaks down πŸ”¨ all the key concepts within just a single paragraph.

πŸ“œ Original paper: https://arxiv.org/abs/2501.12948
  • 1 reply
Β·
reacted to singhsidhukuldeep's post with πŸ”₯ 3 days ago
view post
Post
2308
Excited to share groundbreaking research in Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG)!

Researchers from the University of Science and Technology of China have developed FRAG - a novel flexible modular framework that revolutionizes how Large Language Models (LLMs) reason with knowledge graphs.

What makes FRAG special? It intelligently adapts retrieval strategies based on query complexity without requiring expensive KG fine-tuning. The framework uses a reasoning-aware module to classify queries as simple or complex, then applies tailored retrieval pipelines.

Under the hood:
- For simple queries: Uses breadth-first search and ranking to efficiently find relevant paths
- For complex queries: Employs shortest path algorithms to minimize computational overhead
- Features a preprocessing-retrieval-postprocessing pipeline with flexible components
- Leverages traditional algorithms like PersonalizedPageRank for subgraph extraction
- Implements edge and path ranking models for precise information filtering

The results are impressive - FRAG achieves state-of-the-art performance while maintaining high efficiency and low resource consumption. On benchmark datasets like WebQSP and CWQ, it outperforms existing approaches by significant margins.

Most importantly, FRAG maintains flexibility and modularity while improving retrieval quality - no expensive LLM fine-tuning required! This makes it highly practical for real-world applications.

This work represents a major step forward in making LLMs more reliable and capable of complex reasoning tasks. Looking forward to seeing how this technology evolves!
  • 2 replies
Β·
reacted to prithivMLmods's post with πŸ‘€ 3 days ago
view post
Post
4303
o3-Mini and Deepseek R1
Worked out with some famous and weird examples.

πŸ”₯Blog: https://huggingface.co./blog/prithivMLmods/o3-mini-vs-deepseek-r1

Prompt : Using HTML, CSS, and JavaScript in a single HTML file to create a simulation of the solar system. Pay extreme attention to the UI to make it as intuitive as possible. Ensure that every planet appears as a sphere and is labeled with its corresponding name.

example 1: o3 Mini , example 2: Deepseek R1

Q2 : https://huggingface.co./blog/prithivMLmods/o3-mini-vs-deepseek-r1#q2--web-solar-system-explorer
reacted to ezgikorkmaz's post with πŸ‘ 3 days ago
reacted to as-cle-bert's post with πŸ‘ 4 days ago
view post
Post
893
Hi HuggingFace community!πŸ€—

I just published an article in which I try to articulate some counter-points to Dario Amodei's post "On DeepSeek and Export Control"πŸ‘‰ https://huggingface.co./blog/as-cle-bert/why-we-dont-need-export-control

I try to address several key passages of the third section from Amodei's post (https://darioamodei.com/on-deepseek-and-export-controls), bringing my perspective on the importance of open source, open knowledge and multipolarity in a crucial field for our future such as Artificial Intelligence.

Happy reading!✨
reacted to MonsterMMORPG's post with πŸ‘€ 4 days ago
view post
Post
1702
Paints-UNDO Installers Published - Undo Images Like Drawing From Scratch - 1-Click Install for Windows, RunPod, Massed Compute, Kaggle

Installers shared here : https://www.patreon.com/posts/121228327

Check attached images

PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings

So what this APP do is that, you give an input image, it tries to mimic how that image could have been drawn with an artist like steps of drawing it

The APP generates image steps points in time and also a video output of drawing like drawing from scratch

Official Repo : https://github.com/lllyasviel/Paints-UNDO

We have Low VRAM mode and it works great

1-Click Installers and Gradio APP : https://www.patreon.com/posts/121228327

We have 1-Click installer for Windows, RunPod, Massed Compute and a Free Kaggle Notebook. Read this post extremely carefully to learn how to use all
reacted to kristaller486's post with πŸš€ 4 days ago
view post
Post
1281
Nebo-T1-Russian

(Probably) the first "longCoT" dataset for the Russian language created via Deeseek-R1.

- Prompts taken from the Sky-T1 dataset and translated via Llama3.3-70B.
- Answers and reasoning generated by Deepseek-R1 (685B).
- 16.4K samples in total, β‰ˆ12.4K Russian-only (in the rest, either the answer or reasoning is in English).
- Languages in the answers and reasoning are labeled using fasttext.

kristaller486/Nebo-T1-Russian
reacted to rubenroy's post with πŸ”₯ 4 days ago
reacted to chansung's post with πŸ‘ 4 days ago
view post
Post
3913
A brief summary of the o3-mini

The OpenAI o3-mini model is a significant improvement over the o1-mini, reaching o1 performance levels. While generally good, its performance isn't universally better than previous models (o1, o1-prev.) or GPT-4o across all benchmarks. This means workflows should be re-evaluated with each model upgrade.

The o3-mini has "low," "medium," and "high" versions, with "low" being the base model used for benchmarking. It's speculated that the higher versions simply involve more processing. A fair comparison with other models like Gemini 2.0 Thinking or DeepSeek-R1 would likely need to use the "low" version and a similar "think more" mechanism.

The system card is recommended reading due to its comprehensive benchmark data.

https://openai.com/index/openai-o3-mini/
posted an update 4 days ago
view post
Post
1559
πŸ“’ The LLaMA-3.1-8B distilled 8B version of the R1 DeepSeek AI is available besides the one based on Qwen

πŸ“™ Notebook for using it in reasoning over series of data 🧠 :
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_llama3.ipynb

Loading using the pipeline API of the transformers library:
https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_llama.py
🟑 GPU Usage: 12.3 GB (FP16/FP32 mode) which is suitable for T4. (a 1.5 GB less than Qwen-distilled version)
🐌 Perfomance: T4 instance: ~0.19 tokens/sec (FP32 mode) and (FP16 mode) ~0.22-0.30 tokens/sec. Is it should be that slow? πŸ€”
Model name: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
⭐ Framework: https://github.com/nicolay-r/bulk-chain
🌌 Notebooks and models hub: https://github.com/nicolay-r/nlp-thirdgate
posted an update 5 days ago
view post
Post
1282
🚨 MistralAI is back with the mistral small V3 model update and it is free! πŸ‘
https://docs.mistral.ai/getting-started/models/models_overview/#free-models

πŸš€ Below is the the provider for reasoning over your dataset rows with custom schema 🧠
https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/mistralai_150.py

My personal usage experience and findings:
⚠️The original API usage may constanly fail with the connection.
To bypass this limitation, use --attempts [COUNT] to withstand connection loss while iterating through JSONL/CSV data (see πŸ“· below)

πŸ’΅ It is actually: ~0.18 USD 1M tokens
🌟 Framework: https://github.com/nicolay-r/bulk-chain