Adibvafa Fallahpour

adibvafa

AI & ML interests

Machine Learning, Healthcare, Computational Biology, Neuroscience, Synthetic Biology

Recent Activity

updated a model 17 days ago
adibvafa/BLIP-MIMIC-CXR
liked a dataset 3 months ago
axiong/pmc_oa
View all activity

Organizations

Vector Institute's profile picture

adibvafa's activity

New activity in adibvafa/CodonTransformer 3 months ago
reacted to m-ric's post with šŸ”„ 4 months ago
view post
Post
3390
šŸ”„ šš°šžš§ š«šžš„šžššš¬šžš¬ š­š”šžš¢š« šŸ.šŸ“ šŸššš¦š¢š„š² šØšŸ š¦šØššžš„š¬: ššžš° š’šŽš“š€ šŸšØš« ššš„š„ š¬š¢š³šžš¬ š®š© š­šØ šŸ•šŸš!

The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there:
Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
Qwen2.5-Math: 1.5B, 7B, and 72B.

And they didn't sleep: the performance is top of the game for each weight category!

šŠšžš² š¢š§š¬š¢š š”š­š¬:

šŸŒ All models have šŸ­šŸ®šŸ“š—ø š˜š—¼š—øš—²š—» š—°š—¼š—»š˜š—²š˜…š˜ š—¹š—²š—»š—“š˜š—µ

šŸ“š Models pre-trained on 18T tokens, even longer than the 15T of Llama-3

šŸ’Ŗ The flagship š—¤š˜„š—²š—»šŸ®.šŸ±-šŸ³šŸ®š—• š—¶š˜€ ~š—°š—¼š—ŗš—½š—²š˜š—¶š˜š—¶š˜ƒš—² š˜„š—¶š˜š—µ š—Ÿš—¹š—®š—ŗš—®-šŸÆ.šŸ­-šŸ°šŸ¬šŸ±š—•, š—®š—»š—± š—µš—®š˜€ š—® šŸÆ-šŸ±% š—ŗš—®š—暝—“š—¶š—» š—¼š—» š—Ÿš—¹š—®š—ŗš—®-šŸÆ.šŸ­-šŸ³šŸ¬š—• š—¼š—» š—ŗš—¼š˜€š˜ š—Æš—²š—»š—°š—µš—ŗš—®š—暝—øš˜€.

šŸ‡«šŸ‡· On top of this, it š˜š—®š—øš—²š˜€ š˜š—µš—² #šŸ­ š˜€š—½š—¼š˜ š—¼š—» š—ŗš˜‚š—¹š˜š—¶š—¹š—¶š—»š—“š˜‚š—®š—¹ š˜š—®š˜€š—øš˜€ so it might become my standard for French

šŸ’» Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct). Let's wait for their 32B to come out!

šŸ§® Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by "aggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles."

šŸ“„ Technical report to be released "very soon"

šŸ”“ All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning "you can use it for free EXCEPT if your product has over 100M users"

šŸ¤— All models are available on the HF Hub! āž”ļø Qwen/qwen25-66e81a666513e518adb90d9e
  • 2 replies
Ā·
New activity in huggingface/HuggingDiscussions 4 months ago

[FEEDBACK] Daily Papers

107
#32 opened 8 months ago by
kramp