Flax Community

non-profit

https://github.com/huggingface/transformers/tree/master/examples/research_projects/jax-projects

Activity Feed

AI & ML interests

JAX, Flax, TPU, 🤗

Recent Activity

gagan3012 authored a paper 22 days ago

DateLogicQA: Benchmarking Temporal Biases in Large Language Models

paws authored a paper about 1 month ago

Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models

gagan3012 authored a paper 2 months ago

Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks

View all activity

flax-community's activity

rwightman

posted an update 2 days ago

Post

994

New timm 1.0.13 and OpenCLIP 2.30.0 releases to start the year. Both modest but worthwhile updates.

timm added a number of new model weights, supporting loading of:
* PaliGemma2 encoders (ported from google/paligemma-2-release-67500e1e1dbfdd4dee27ba48)
* AIMv-2 encoders (ported from apple/aimv2-6720fe1558d94c7805f7688c)

A few higher resolution 384x384 ConvNeXt-Nano ImageNet-12k pretrain & finetunes. See other changes here: https://github.com/huggingface/pytorch-image-models/releases/tag/v1.0.13

And support added in both OpenCLIP and timm for two CLIP models that were missed. The DFN L/14 is 🔥
* DFN CLIP L/14 w/ 39B samples seen - apple/DFN2B-CLIP-ViT-L-14-39B, timm/vit_large_patch14_clip_224.dfn2b_s39b
* MetaCLIP H/14 (altogether) - timm/vit_huge_patch14_clip_224.metaclip_altogether

And last, ~70-80 models that were relying on timm remapping from OpenCLIP got their own timm hub instances to allow use with the upcoming Transformers TimmWrapperModel

paws

authored a paper about 1 month ago

Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models

Paper • 2412.02980 • Published Dec 4, 2024 • 12

rwightman

posted an update about 1 month ago

Post

1381

There's a new timm release, v 1.0.12, with a focus on optimizers. The optimizer factory has been refactored, there's now a timm.optim.list_optimizers() and new way to register optimizers and their attributes. As always you can use an timm optimizer like a torch one, just replace torch.optim with timm.optim

New optimizers include:
* AdafactorBigVision - adfactorbv
* ADOPT - adopt / adoptw (decoupled decay)
* MARS - mars
* LaProp - laprop
* Cautious Optimizers - a modification to all of the above, prefix with c as well as cadamw, cnadamw, csgdw, clamb, crmsproptf

I shared some caution comparisons in this model repo: rwightman/timm-optim-caution

For details, references, see the code: https://github.com/huggingface/pytorch-image-models/tree/main/timm/optim

3 replies

rwightman

posted an update about 2 months ago

Post

1338

I'm currently on a push to expand the scope of image based datasets on the Hub. There's certainly a lot already, but for anyone who's looked closely, there's not a whole lot of standardization. I am to fix that, datasets under the https://huggingface.co./timm and https://huggingface.co./pixparse orgs will serve as canonical examples for various task / modality combinations and be useable without fuss in libraries like timm, OpenCLIP, and hopefully more.

I just uploaded the first multi-label dataset that I'll support with timm scripts soon: timm/plant-pathology-2021

Next up object detection & segmentation! I've got an annotation spec sorted out, a lot of datasets ready to rip, and yeah that means timm support for object detection, eventually segmentation, is finally under development :O

rwightman

posted an update about 2 months ago

Post

1062

Want to validate some hparams or figure out what timm model to use before commiting to download or training with a large dataset? Try mini-imagenet: timm/mini-imagenet

I had this sitting on my drive and forgot where I pulled it together from. It's 100 classes of imagenet, 50k train and 10k val images (from ImageNet-1k train set), and 5k test images (from ImageNet-1k val set). 7.4GB instead of > 100GB for the full ImageNet-1k. This ver is not reduced resolution like some other 'mini' versions. Super easy to use with timm train/val scripts, checkout the dataset card.

I often check fine-tuning with even smaller datasets like:
* timm/resisc45
* timm/oxford-iiit-pet
But those are a bit small to train any modest size model w/o starting from pretrained weights.

rwightman

posted an update about 2 months ago

Post

1588

New MobileNetV4 weights were uploaded a few days ago -- more ImageNet-12k training at 384x384 for the speedy 'Conv Medium' models.

There are 3 weight variants here for those who like to tinker. On my hold-out eval they are ordered as below, not that different, but the Adopt 180 epochs closer to AdamW 250 than to AdamW 180.
* AdamW for 250 epochs - timm/mobilenetv4_conv_medium.e250_r384_in12k
* Adopt for 180 epochs - timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
* AdamW for 180 epochs - timm/mobilenetv4_conv_medium.e180_r384_in12k

This was by request as a user reported impressive results using the 'Conv Large' ImagNet-12k pretrains as object detection backbones. ImageNet-1k fine-tunes are pending, the weights do behave differently with the 180 vs 250 epochs and the Adopt vs AdamW optimizer.

sidicity

authored a paper 2 months ago

Survey of Cultural Awareness in Language Models: Text and Beyond

Paper • 2411.00860 • Published Oct 30, 2024 • 23

nbroad

posted an update 2 months ago

Post

2445

https://huggingface.co./organizations/nerdyface/share/xvWxWxYmYpCLqZlvNJEZbJHFsDITAicJAT

nbroad

posted an update 2 months ago

Post

3575

hi florent and livestream!

5 replies

nbroad

authored a paper 2 months ago

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

Paper • 2206.15076 • Published Jun 30, 2022 • 3

tasnim

authored a paper 3 months ago

DM-Codec: Distilling Multimodal Representations for Speech Tokenization

Paper • 2410.15017 • Published Oct 19, 2024 • 1

rwightman

posted an update 3 months ago

Post

652

A new timm release (1.0.11) is out now. A also wrote an article on one of the included models: https://huggingface.co./blog/rwightman/mambaout

Featured in the release are:
* The MambaOut model, a cheeky arch inspired by SSM but without the SSM part, a ConvNeXt with gating.
* Several timm trained MambaOut variations with arch tweaks and ImageNet-12k pretrain to verify scaling, supplement ported weights.
* The smallest MobileNetV4, a 0.5x width scaled Conv-Small.
* Two impressive MobileNetV3 Large models outperforming all previous, using MNV4 Small recipe.
* 'Zepto,' a new compact ConvNeXt variant even smaller than the previous Atto, 2.2M params, RMSNorm, and solid results for its size.
* Newly ported SigLIP SO400M/16 ViT multi-lingual weights, the largest i18n weights, prevous was B/16.
* Two ImageNet-1k fine-tuned SigLIP SO400M models at 378x378
* InternViT 300M weight port. A really solid ViT encoder distilled from OpenGVLab 6B VL model encoder.
* An assortment of very small, sub 1M param pretrained test models to improve library unit tests and serve low-resource applications.

rwightman

posted an update 4 months ago

Post

2525

A 'small' MobileNet-V4 update, I just pushed weights for the smallest model I've trained in the series, a 0.5 width multiplier version of the MobileNet-V4 Conv Small.

Now you may look at this and say hey, why is this impressive? 64.8% top-1 and 2.2M params? MobileNetV3-Small 0.75, and MobileNet-V2 0.5 are both fewer params (at ~2M) and over 65% top-1, what gives? Well this is where MobileNet-V4 differs from the previous versions of the model family, it trades off (gives up) a little parameter efficiency for some computational efficiency.

So, let's look at the speed. On a 4090 w/ torchcompile
* 98K img/sec - timm/mobilenetv4_conv_small_050.e3000_r224_in1k
* 58K img/sec - timm/mobilenetv3_small_075.lamb_in1k
* 37K img/sec - timm/mobilenetv2_050.lamb_in1k

And there you go, if you have a need for speed, MNV4 is the better option.

rwightman

posted an update 4 months ago

Post

1291

The timm leaderboard timm/leaderboard has been updated with the ability to select different hardware benchmark sets: RTX4090, RTX3090, two different CPUs along with some NCHW / NHWC layout and torch.compile (dynamo) variations.

Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit ( timm/test_vit.r160_in1k)
* test_efficientnet ( timm/test_efficientnet.r160_in1k)
* test_byobnet ( timm/test_byobnet.r160_in1k, a mix of resnet, darknet, effnet/regnet like blocks)

They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?

rwightman

posted an update 5 months ago

Post

2062

The latest timm validation & test set results are now viewable by a leaderboard space: timm/leaderboard

As of yesterday, I updated all of the results for ImageNet , ImageNet-ReaL, ImageNet-V2, ImageNet-R, ImageNet-A, and Sketch sets. The csv files can be found in the GH repo https://github.com/huggingface/pytorch-image-models/tree/main/results

Unfortunately the latest benchmark csv files are not yet up to date, there are some gaps in dataset results vs throughput/flop numbers impact the plots.

h/t to @MohamedRashad for making the first timm leaderboard.

1 reply

gsarti

authored a paper 5 months ago

Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses

Paper • 2408.00584 • Published Aug 1, 2024 • 6

rwightman

posted an update 6 months ago

Post

1992

I can't resist an opportunity to update an old baseline. Read a new article on my latest look at improving MobileNet-V1 and EfficientNet-B0 baselines.

https://huggingface.co./blog/rwightman/mobilenet-baselines
timm/mobilenetv1_100.ra4_e3600_r224_in1k
timm/efficientnet_b0.ra4_e3600_r224_in1k

gsarti

authored 2 papers 7 months ago

Multi-property Steering of Large Language Models with Dynamic Activation Composition

Paper • 2406.17563 • Published Jun 25, 2024 • 4

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Paper • 2406.13663 • Published Jun 19, 2024 • 7

rwightman

posted an update 7 months ago

Post

2455

MobileNetV4 weights are now in timm! So far these are the only weights for these models as the offiicial Tensorflow impl remains weightless.

Guided by paper hparams with a few tweaks, I've managed to match or beat the paper results training the medium models. I'm still working on large and improving the small result. They appear to be solid models for on-device use.

timm/mobilenetv4-pretrained-weights-6669c22cda4db4244def9637

MobileNetV4 -- Universal Models for the Mobile Ecosystem (2404.10518)

1 reply

AI & ML interests

Recent Activity

Team members 269

flax-community's activity