Shon Fernandez's picture

Shon Fernandez

flexicious

AI & ML interests

None yet

Recent Activity

Organizations

None yet

flexicious's activity

commented on Let's talk about LLM evaluation 12 days ago
view reply

From developing LLM applications over the past couple years, I've realized that regardless of what the hype is all about - nothing beats testing LLMS on your own specific use cases using your own evaluation metrics. For example, I did a comparison of O3-mini vs R1 vs Gemini Flash thinking https://www.youtube.com/watch?v=iBS_FsLcSN0 and realized for certain use cases, they are no better than regular non reasoning models. I am very curious to learn what people are using reasoning models for and how they are evaluating them!