Maxi PRO

maxiw

AI & ML interests

Computer Agents | VLMs

Organizations

maxiw's activity

posted an update 4 days ago
view post
Post
1508
Exciting to see open-source models thriving in the computer agent space! ๐Ÿ”ฅ
I just built a demo for OS-ATLAS: A Foundation Action Model For Generalist GUI Agents โ€” check it out here: maxiw/OS-ATLAS

This demo predicts bounding boxes based on screenshot + instructions as input.
reacted to cbensimon's post with โค๏ธ 2 months ago
view post
Post
4229
Hello everybody,

We've rolled out a major update to ZeroGPU! All the Spaces are now running on it.

Major improvements:

1. GPU cold starts about twice as fast!
2. RAM usage reduced by two-thirds, allowing more effective resource usage, meaning more GPUs for the community!
3. ZeroGPU initializations (coldstarts) can now be tracked and displayed (use progress=gr.Progress(track_tqdm=True))
4. Improved compatibility and PyTorch integration, increasing ZeroGPU compatible spaces without requiring any modifications!

Feel free to answer in the post if you have any questions

๐Ÿค— Best regards,
Charles
replied to m-ric's post 2 months ago
replied to their post 2 months ago
view reply

@fridayfairy this is not fine-tuned. It's the base model just prompted to return bounding boxes in a specific format. The Qwen2-VL models must have been pre-trained on detection data.

reacted to rwightman's post with ๐Ÿ‘ 2 months ago
view post
Post
1267
The timm leaderboard timm/leaderboard has been updated with the ability to select different hardware benchmark sets: RTX4090, RTX3090, two different CPUs along with some NCHW / NHWC layout and torch.compile (dynamo) variations.

Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit ( timm/test_vit.r160_in1k)
* test_efficientnet ( timm/test_efficientnet.r160_in1k)
* test_byobnet ( timm/test_byobnet.r160_in1k, a mix of resnet, darknet, effnet/regnet like blocks)

They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?
replied to their post 2 months ago
posted an update 2 months ago
view post
Post
2261
The new Qwen-2 VL models seem to perform quite well in object detection. You can prompt them to respond with bounding boxes in a reference frame of 1k x 1k pixels and scale those boxes to the original image size.

You can try it out with my space maxiw/Qwen2-VL-Detection

ยท
posted an update 3 months ago
view post
Post
2267
Just added the newly released xGen-MM v1.5 foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research to my xGen-MM HF Space maxiw/XGen-MM
  • 2 replies
ยท