Bhadresh Savani's picture

Bhadresh Savani

bhadresh-savani

AI & ML interests

NLP, Deep Learning, ML

Recent Activity

Organizations

Flax Community's profile picture HugGAN Community's profile picture ONNXConfig for all's profile picture Keras Dreambooth Event's profile picture Lambda Go Labs's profile picture

bhadresh-savani's activity

upvoted an article about 21 hours ago
view article
Article

Hugging Face and JFrog partner to make AI Security more transparent

16
upvoted an article 3 days ago
view article
Article

Trace & Evaluate your Agent with Arize Phoenix

28
upvoted an article about 1 month ago
view article
Article

How to deploy and fine-tune DeepSeek models on AWS

51
reacted to lin-tan's post with 🔥 3 months ago
view post
Post
1441
Can language models replace developers? #RepoCod says “Not Yet”, because GPT-4o and other LLMs have <30% accuracy/pass@1 on real-world code generation tasks.
- Leaderboard https://lt-asset.github.io/REPOCOD/
- Dataset: lt-asset/REPOCOD
@jiang719 @shanchao @Yiran-Hu1007
Compared to #SWEBench, RepoCod tasks are
- General code generation tasks, while SWE-Bench tasks resolve pull requests from GitHub issues.
- With 2.6X more tests per task (313.5 compared to SWE-Bench’s 120.8).

Compared to #HumanEval, #MBPP, #CoderEval, and #ClassEval, RepoCod has 980 instances from 11 Python projects, with
- Whole function generation
- Repository-level context
- Validation with test cases, and
- Real-world complex tasks: longest average canonical solution length (331.6 tokens) and the highest average cyclomatic complexity (9.00)

Introducing hashtag #RepoCod-Lite 🐟 for faster evaluations: 200 of the toughest tasks from RepoCod with:
- 67 repository-level, 67 file-level, and 66 self-contains tasks
- Detailed problem descriptions (967 tokens) and long canonical solutions (918 tokens)
- GPT-4o and other LLMs have < 10% accuracy/pass@1 on RepoCod-Lite tasks.
- Dataset: lt-asset/REPOCOD_Lite

#LLM4code #LLM #CodeGeneration #Security
  • 2 replies
·
upvoted an article 7 months ago
upvoted an article 7 months ago
upvoted an article 7 months ago
view article
Article

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

58