Gabriele Sarti's picture

Gabriele Sarti

gsarti

·

https://gsarti.com

AI & ML interests

Interpretability for generative language models

Recent Activity

liked a dataset 2 days ago

bespokelabs/bespoke-manim

liked a model 2 days ago

mlabonne/SmolGRPO-135M

updated a dataset 3 days ago

gsarti/qe4pe

View all activity

Organizations

gsarti's activity

commented a paper 11 days ago

We Can't Understand AI Using our Existing Vocabulary

Paper • 2502.07586 • Published 17 days ago • 10 •

commented a paper about 1 month ago

Enhancing Automated Interpretability with Output-Centric Feature Descriptions

Paper • 2501.08319 • Published Jan 14 • 10 •

commented a paper 3 months ago

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Paper • 2411.14257 • Published Nov 21, 2024 • 13 •

New activity in gsarti/opus-mt-tc-en-pl 5 months ago

how to fine tune this model to get better polish translation

#3 opened almost 2 years ago by

commented 2 papers 7 months ago

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

Paper • 2408.00113 • Published Jul 31, 2024 • 7 •

Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses

Paper • 2408.00584 • Published Aug 1, 2024 • 6 •

New activity in huggingface/HuggingDiscussions 7 months ago

[FEEDBACK] Collections

#12 opened over 1 year ago by

New activity in unsloth/Phi-3-mini-4k-instruct-v0-bnb-4bit 7 months ago

Silent swapping of Phi-3 mini model

#1 opened 7 months ago by

commented 6 papers 8 months ago

LLM Circuit Analyses Are Consistent Across Training and Scale

Paper • 2407.10827 • Published Jul 15, 2024 • 5 •

Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

Paper • 2406.20086 • Published Jun 28, 2024 • 5 •

Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

Paper • 2406.20086 • Published Jun 28, 2024 • 5 •

Multi-property Steering of Large Language Models with Dynamic Activation Composition

Paper • 2406.17563 • Published Jun 25, 2024 • 4 •

Confidence Regulation Neurons in Language Models

Paper • 2406.16254 • Published Jun 24, 2024 • 10 •

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Paper • 2406.13663 • Published Jun 19, 2024 • 7 •

New activity in ICLR2024/ICLR2024-papers 10 months ago

Update 18449

#2 opened 10 months ago by

New activity in ICLR2024/ICLR2024-papers 10 months ago

Update 18449

#12 opened 10 months ago by

New activity in gsarti/gradio_highlightedtextbox 10 months ago

tip + patch to solve typing

#2 opened 10 months ago by

New activity in aliabid94/gradio_modal 12 months ago

Modal defaults to shown when changing tab

#1 opened about 1 year ago by

New activity in gsarti/gradio_highlightedtextbox about 1 year ago

[BUG] Custom color map doesn't stick

#1 opened about 1 year ago by

New activity in social-post-explorers/README about 1 year ago

Rate limit issue with imprecise last post time information

#26 opened about 1 year ago by