lisass123

linsa11

AI & ML interests

None yet

Recent Activity

reacted to etemiz's post with 🤗 21 days ago

-= DeepSeek V3 =- After installing the new CUDA toolkit and compiling llama.cpp again I tested DeepSeek V3 yesterday. In terms of human alignment DeepSeek V3 did worse on: - health - fasting - nostr - misinfo - nutrition did better on: - faith - bitcoin - alternative medicine - ancient wisdom compared to DeepSeek 2.5. In my opinion overall it is worse than 2.5. And 2.5 wasn't that great. There is a general tendency of models getting smarter but at the same time getting less wiser, less human aligned, less beneficial to humans. I don't know what is causing this. But maybe synthetic dataset use for further training the LLMs makes it more and more detached from humanity. This is not going in the right direction. My solution is to come up with a curator council to determine the datasets that are closest to human preference. "Humans that care about other humans the most" could be a definition of this dataset. What do you think?

reacted to etemiz's post with 👀 21 days ago

upvoted an article 21 days ago

Mastering Tensor Dimensions in Transformers

View all activity

Organizations

None yet

linsa11's activity

upvoted an article 21 days ago

Article

Mastering Tensor Dimensions in Transformers

•

22 days ago

• 42

upvoted a paper 2 months ago

Can LLMs Learn by Teaching? A Preliminary Study

Paper • 2406.14629 • Published Jun 20, 2024 • 20

upvoted a collection 3 months ago

📑 Trending Papers - October 🔟

Collection

10 items • Updated Dec 24, 2024 • 6

upvoted 2 papers 3 months ago

Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks

Paper • 2410.24032 • Published Oct 31, 2024 • 9

BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments

Paper • 2410.23918 • Published Oct 31, 2024 • 19

upvoted a collection 3 months ago

my paper

Collection

浏览论文收藏 • 5 items • Updated Nov 12, 2024 • 1

upvoted 7 papers 3 months ago

What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31, 2024 • 60

Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Paper • 2410.22366 • Published Oct 28, 2024 • 77

ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Paper • 2410.17856 • Published Oct 23, 2024 • 49

Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA

Paper • 2410.20672 • Published Oct 28, 2024 • 6

GPT-4o System Card

Paper • 2410.21276 • Published Oct 25, 2024 • 83

Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

Paper • 2410.22304 • Published Oct 29, 2024 • 17

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 89