huggingface/HuggingDiscussions · [FEEDBACK] Daily Papers

Hugging Face org Jun 12, 2024

•

edited Jul 25, 2024

Note that this is not a post about adding new papers, it's about feedback on the Daily Papers community update feature.

How to submit a paper to the Daily Papers, like @akhaliq (AK)?

Submitting is available to paper authors
Only recent papers (less than 7d) can be featured on the Daily

Then drop the arxiv id in the form at https://huggingface.co./papers/submit

Add medias to the paper (images, videos) when relevant
You can start the discussion to engage with the community

Please check out the documentation

RollingPig

Jun 17, 2024

https://arxiv.org/abs/2406.01954

runninglsy

Jun 18, 2024

•

edited Jun 27, 2024

We are excited to share our recent work on MLLM architecture design titled "Ovis: Structural Embedding Alignment for Multimodal Large Language Model".

Paper: https://arxiv.org/abs/2405.20797
Github: https://github.com/AIDC-AI/Ovis
Model: https://huggingface.co./AIDC-AI/Ovis-Clip-Llama3-8B
Data: https://huggingface.co./datasets/AIDC-AI/Ovis-dataset

Yiwen-ntu

Jun 18, 2024

This comment has been hidden

kramp

Hugging Face org Jun 18, 2024

@Yiwen-ntu for now we support only videos as paper covers in the Daily.

renqiux0302

Jun 19, 2024

This comment has been hidden

taki555

Jun 19, 2024

This comment has been hidden

devichand

Jun 20, 2024

we are excited to share our work titled "Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models" : https://arxiv.org/abs/2406.12644

123 hidden messages

Expand all

qqqwt

10 days ago

Dear AK and HF Team,

We are thrilled to present our recent research, which investigates and benchmarks various inference-time computation strategies to enhance reasoning performance in large language models (LLMs). With the growing interest in solving complex reasoning tasks, methods such as Best-of-N and beam search have shown promise in improving reasoning capabilities without requiring modifications to model parameters or additional training. However, challenges remain in their implementation, with many existing approaches still in the proof-of-concept stage, hindered by computational complexity and task-specific limitations.

In this work, we focus on optimizing both the candidate solution generation and the reward mechanisms that underpin these inference-time strategies. By exploring the impact of different prompting techniques, hyperparameters like temperature and top-p, and reward types such as self-evaluation and RLHF rewards, we uncover previously overlooked strategies that significantly enhance reasoning performance. Our extensive experiments—spanning over 20,000 A100-80G GPU hours and 1,000+ experiments—cover various models from the Llama, Qwen, and Mistral families. These findings demonstrate that careful tuning of hyperparameters like temperature can lead to performance gains of up to 5% in reasoning tasks.

Furthermore, we establish a standardized benchmark for evaluating inference-time computation techniques, assessing six representative methods across eight different reasoning tasks. Our work provides a robust foundation for advancing future research in this area, setting the stage for more practical and scalable applications of LLM-based reasoning systems.

Title: Bag of Tricks for Inference-time Computation of LLM Reasoning

Link: https://arxiv.org/abs/2502.07191

Github: https://github.com/usail-hkust/benchmark_inference_time_computation_LLM

Kanesblack

9 days ago

Dear AK and HF Team,

We are excited to share our work on Text-to-SQL. The information for the paper we submitted is as follows:

Title: SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL
Link: https://arxiv.org/abs/2502.11741
Github: https://github.com/ShuaiLyu0110/SQL-o1

junzhang98

9 days ago

Dear AK and HF Team,

Buckle up for a wild ride into the world of large language models! 🚀 Ever wished you could fine-tune massive LLMs without needing a full-blown data center? Well, dream no more! Our new approach, LoRAM, is here to train small and infer large—bringing you memory-efficient LoRA training without sacrificing performance.

Imagine turning a 70-billion-parameter beast into a nimble, memory-efficient marvel—like transforming an elephant into a sleek race car! 🐘➡️🏎️ We take the classic LoRA method, give it a trendy haircut by pruning away those underutilized neurons 💇‍♂️, and then recover the pruned low-rank matrices to supercharge the full model during inference.

The Challenge 🤯

While LoRA offers a cost-effective fine-tuning solution, the memory footprint remains dominated by the original model parameters. Training a 70B model traditionally demands an A100-80G GPU or even a fleet of 15 GPUs. Yikes!

The LoRAM Magic 🪄

LoRAM turns this challenge on its head by:

Tiny Yet Mighty: Training on a pruned (small) model with just 20G HBM—no need for heavyweight GPUs! 🎉
Wallet-Friendly Wizardry: Using structured pruning combined with 4-bit quantization (QLoRAM) slashes storage costs by up to 16.95×, proving that efficiency and performance can indeed dance together! 💃💸
Seamless Sync: Minimal-cost continual pre-training aligns the knowledge between the pruned and original models, ensuring no magic is lost in translation. 🔗✨

The Results 🤯🚀

With LoRAM, we not only achieve dominant performance gains over both the original 70B model and smaller LoRA-trained models but also make massive model training accessible—running on a single 20G GPU!

Curious to see the magic in action? Check out our paper and code:

Paper: Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
GitHub: LoRAM on GitHub

We can’t wait for you to join us on this exhilarating journey where smart engineering meets a splash of neural magic! 😄🌟

Cheers,
The LoRAM Team

saadob12

8 days ago

•

edited 8 days ago

Dear AK and HF team,

We are excited to share our new paper on estimating hallucination rates of 11 large multilingual language models across 30 languages.
The paper comes with 2 datasets that are open source and ready to be used by the community. Below is the figure showing hallucination rates across 11 LLMs for 30 languages.

Summary of our findings:

Within LLM family, smaller LLM hallucinate more than large variant.

Increasing number of supported languages correlate significantly with increasing number of hallucinations.

Smaller digital representation of a language does not necessarily mean higher hallucination rates.

Resources:
The paper releases two datasets: for 30 languages.

Multilingual Hallucination Detection: https://huggingface.co./datasets/WueNLP/mHallucination_Detection
Multilingual Hallucination Evaluation: https://huggingface.co./datasets/WueNLP/mHallucination_Evaluation

Paper, Dataset, and Code:

Archive Paper: How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild
Huggingface collection: https://huggingface.co./collections/WueNLP/mhallucinations-llm-67b5aedb0e7fed1190e148d8
Github: https://github.com/WorldHellow/mHallucinations-LLM

Hopefully the community would enjoy reading and utilizing our work.

Cheers

rippleripple

4 days ago

•

edited 4 days ago

Dear AK and HF Team,

We are excited to share our work on Multimodal Inconsistency Reasoning (MMIR). The information for the paper we submitted is as follows:

Title: Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Paper Link: https://arxiv.org/pdf/2502.16033
Github: https://github.com/eric-ai-lab/MMIR
Dataset: https://huggingface.co./datasets/rippleripple/MMIR

zhiminy

2 days ago

Dear AK and HF Team,

I’m super excited to recommend SE Arena, a new interactive platform for benchmarking Software Engineering chatbots.

🚀 If you’re working with AI in software dev or just passionate about improving how these models perform in real-world dev workflows, you have to check it out!

🧭 The best part? SE Arena has a transparent, open-source leaderboard, and you can actively contribute by casting your votes to shape the evaluations. Plus, with RepoChat, it pulls in real repo context (issues, commits, PRs) to make things feel real.

📣 Want to get involved and help drive the future of AI in software engineering? Head over to https://huggingface.co./spaces/SE-Arena/Software-Engineering-Arena and cast your vote today! 🙌

Our paper is published in FORGE 2025: https://conf.researchr.org/details/forge-2025/forge-2025-papers/6/SE-Arena-An-Interactive-Platform-for-Evaluating-Foundation-Models-in-Software-Engine
Check the details in https://arxiv.org/abs/2502.01860

We’d love your feedback and contributions! 🚀

mbkim

1 day ago

Can we align LLMs with personal preferences? It is hard to collect individual annotations sufficiently and train LLMs for each persona.... The answer is
Yes! 🏎️ Drift achieves personalized alignment only with 50~100 examples.

Drift Approximation: For efficient preference modeling, we first define various attributes and find the best composite of them to explain given examples.
Differential prompting: We don't need to construct attribute-dedicated datasets! We show differential prompting to evaluate each attribute in a zero-shot manner.
Drift Decoding: We can align LLM with the composite of attributes in a training-free manner! We don't need expensive LLM training and savings for each user.

We prove theoretically each objective of the approximation and decoding stages, and for all stages, there is no gradient computation in the total process.

Check the details here! https://arxiv.org/abs/2502.14289

imsanjoykb

1 day ago

🚀🚀DeepSQL-R1-distill-8B : A Quantized DeepSeek AI Model for SQL Code Generation
🔥 Outperforms Llama-3.2, Mistral-7B, and Claude-3 Sonnet in SQL generation tasks.
⚡ Superior execution accuracy and faster inference speeds for complex SQL queries.
🔍 Optimized for efficiency with quantization & distillation techniques.

Model Link: https://huggingface.co./imsanjoykb/deepSQL-R1-distill-8B
Code Link: https://github.com/imsanjoykb/deepSQL-R1-distill-8B
Paper : https://doi.org/10.6084/m9.figshare.28330301.v1
Inference: https://drive.google.com/file/d/145PP-oW50OMS1bYJaYuUphfufpsuOGWl/view?usp=sharing