Kalle Hilsenbek's picture

Kalle Hilsenbek

Bachstelze

·

https://bachstelze.gitlab.io/multisource/

Bachstelze

AI & ML interests

Combining BERT with instructions for explainable AI: gitlab.com/Bachstelze/instructionbert

Recent Activity

upvoted a paper 10 days ago

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

commented on a paper 10 days ago

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

commented on an article 17 days ago

Announcing AI Energy Score Ratings

View all activity

Organizations

None yet

Bachstelze's activity

upvoted a paper 10 days ago

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

Paper • 2502.11196 • Published 12 days ago • 21

commented a paper 10 days ago

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

Paper • 2502.11196 • Published 12 days ago • 21 •

commented on Announcing AI Energy Score Ratings 17 days ago

Thanks for your effort in energy efficiency. You worked up my curiosity!
Why do smolLM-135m and smolLm-1.7B nearly have the same score besides a 10 times model size difference? Does the identical context size mostly cause it?
Could you please enable encoder-decoder models? They should be in theory more efficient because the input has to be encoded only once and can be reused in every decoding step.

commented a paper about 1 month ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 108 •

commented on Is Attention Interpretable in Transformer-Based Large Language Models? Let’s Unpack the Hype about 1 month ago

Good write-up, though it is missing the dominant attention sink in current decoder-only models:
https://colab.research.google.com/drive/1Fcgug4a6rv9F-Wej0rNveiM_SMNZOtrr?usp=sharing

upvoted an article about 1 month ago

Article

Is Attention Interpretable in Transformer-Based Large Language Models? Let’s Unpack the Hype

By

•

Jan 28

• 4

New activity in answerdotai/ModernBERT-base about 1 month ago

ModernBART wen?

#38 opened about 2 months ago by

New activity in Nart/monolingual_ab 2 months ago

Goldfish model

#5 opened 2 months ago by

New activity in HuggingFaceTB/SmolLM2-360M-Instruct 3 months ago

Adding Evaluation Results

#6 opened 4 months ago by

leaderboard-pr-bot

liked a dataset 4 months ago

KevinZ/oLMpics

Viewer • Updated Apr 19, 2022 • 38.3k • 994 • 1

upvoted a paper 4 months ago

Large Language Model Evaluation via Matrix Nuclear-Norm

Paper • 2410.10672 • Published Oct 14, 2024 • 19

New activity in HuggingFaceTB/SmolLM-135M 4 months ago

Benchmark results

#17 opened 4 months ago by

commented a paper 5 months ago

Emergent properties with repeated examples

Paper • 2410.07041 • Published Oct 9, 2024 • 8 •

upvoted 2 papers 5 months ago

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

Paper • 2410.07170 • Published Oct 9, 2024 • 15

Only-IF:Revealing the Decisive Effect of Instruction Diversity on Generalization

Paper • 2410.04717 • Published Oct 7, 2024 • 18

commented a paper 5 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171 •

upvoted a paper 5 months ago

Cottention: Linear Transformers With Cosine Attention

Paper • 2409.18747 • Published Sep 27, 2024 • 17

commented a paper 5 months ago

Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction

Paper • 2409.17422 • Published Sep 25, 2024 • 25 •

upvoted a paper 5 months ago

EuroLLM: Multilingual Language Models for Europe

Paper • 2409.16235 • Published Sep 24, 2024 • 26

commented a paper 5 months ago

EuroLLM: Multilingual Language Models for Europe

Paper • 2409.16235 • Published Sep 24, 2024 • 26 •