261 53 144

Ross Wightman

rwightman

AI & ML interests

Computer vision, transfer learning, semi/self supervised learning, robotics.

Recent Activity

new activity 2 days ago

timm/ViT-SO400M-14-SigLIP2-378:Model size seems odd

reacted to csabakecskemeti's post with 🤗 4 days ago

Testing Training on AMD/ROCm the first time! I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s. For quantized inference it's a beast (MI50 was also surprisingly fast) For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model. Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.

reacted to csabakecskemeti's post with 🚀 4 days ago

View all activity

Organizations

rwightman's activity

New activity in timm/ViT-SO400M-14-SigLIP2-378 2 days ago

Model size seems odd

#1 opened 2 days ago by

bbb42

reacted to csabakecskemeti's post with 🤗🚀 4 days ago

Post

2693

Testing Training on AMD/ROCm the first time!

I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)

For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.

Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.

8 replies

replied to csabakecskemeti's post 4 days ago

Yeah it's 112 for PCIe V100 and 125 for the SXM I think. One thing on the MI100 and other MIxx chip specs I was never clear on, if their float16 'matrix' numbers are matrix mul float16 w/ float32 accumulate (which is what you'd want). The datacenter NVIDIA chip 'tensor core' flops are usually float32 acc (unless it's a gamer card in which case that's halved).

The MI100 does have native bfloat16 which is a big win over V100.

I do feel though you are getting good TOPS/$ here because AMD hasn't been that successful in competing with NVIDIA on the full system offer (chips + driver/software). I've really really wanted this to change but AMD keeps frustrating... how do you find working with it so far in terms of issues / crashes / head banging? :) Hopefully things have been improving

replied to csabakecskemeti's post 4 days ago

FWIW, the MI100 was released after the A100, 3 years after the V100... that says something :) Also it's the matrix / tensor core mixed or reduced precision FLOPs that are of interest not the float32 FLOPS which are the 14 & 23 numbers..

upvoted a paper 7 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 8 days ago • 118

updated 14 models 7 days ago