4 11 4

RIMA HAZRA

rimahazra

https://sites.google.com/view/rima-hazra

AI & ML interests

AI and Safety, AI Hallucinations, Natural Language Processing, Information Retrieval, Large Language Models.

Recent Activity

liked a dataset 10 days ago

SoftMINER-Group/NicheHazardQA

commented on a paper 22 days ago

How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

commented on a paper 22 days ago

SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models

View all activity

Organizations

Posts 1

Post

771

🔥 🔥 Releasing our new paper on AI safety alignment -- Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations 🎯 with Sayan Layek, Somnath Banerjee and Soujanya Poria.

👉 We propose Safety Arithmetic, a training-free framework enhancing LLM safety across different scenarios: Base models, Supervised fine-tuned models (SFT), and Edited models. Safety Arithmetic involves Harm Direction Removal (HDR) to avoid harmful content and Safety Alignment to promote safe responses.

👉 Paper: https://arxiv.org/abs/2406.11801v1
👉 Code: https://github.com/declare-lab/safety-arithmetic

Collections 1

Papers 11

models

None public yet

datasets

None public yet