Stefmal

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

malteos authored a paper 7 days ago

Tokenizer Choice For LLM Training: Negligible or Crucial?

malteos authored a paper 7 days ago

Towards an Open Platform for Legal Information

malteos authored a paper 7 days ago

Aspect-based Document Similarity for Research Papers

View all activity

stefmal's activity

stefan-it

posted an update about 21 hours ago

Post

1235

After running some 3DMark and FurMark benchmarks on Windows to make sure that my new 5090 is not causing melting cables [1] and some nice shots with a thermal camera (I don't think that's too much), running some fine-tuning experiments with my favorite Flair & Transformers libraries are very easy to perform.

Important steps:

Good idea is to start with a fresh Ubuntu 24.04 installation with latest CUDA 12.8 and the open NVIDIA driver - follow more advices from [2]:

sudo apt -y install cuda-toolkit-12-8 nvidia-open

I tried update from an existing Ubuntu installation with an older CUDA and driver version and it resulted in a non-startable system.

If you are using PyTorch 2.6 with built CUDA 12.6 it will result in:

NVIDIA Graphics Device with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.

But no worries! For PyTorch you need just to use a nightly 2.7 version that was built with CUDA 12.8. This can easily done via:

pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128

After that the latest Flair version can be installed and fine-tuning will work!

References:

[1]: https://www.reddit.com/r/nvidia/comments/1inpox7/rtx_50_series_12vhpwr_megathread/
[2]: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_network

stefan-it

posted an update 4 days ago

Post

5009

She arrived 😍

[Expect more models soon...]

2 replies

malteos

authored 10 papers 7 days ago

Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings

Paper • 2202.06671 • Published Feb 14, 2022 • 2

Specialized Document Embeddings for Aspect-based Similarity of Research Papers

Paper • 2203.14541 • Published Mar 28, 2022

Investigating Gender Bias in Turkish Language Models

Paper • 2404.11726 • Published Apr 17, 2024 • 1

Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning

Paper • 2301.09626 • Published Jan 23, 2023 • 2

Progress Report: Towards European LLMs

Paper • 2410.03730 • Published Sep 30, 2024 • 2

Data Processing for the OpenGPT-X Model Family

Paper • 2410.08800 • Published Oct 11, 2024 • 1

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published 9 days ago • 31

stefan-it

posted an update 3 months ago

Post

1539

My latest project is the outcome of the last 2+ years working with TPUs from the amazing TPU Research Cloud (TRC) program and training Encoder-only LMs with the TensorFlow Model Garden library.

👉 Link: https://github.com/stefan-it/model-garden-lms

An overview of some features:

- Cheatsheet for setting-up a TPU VM Pod (with all necessary dependencies) to pretrain LMs with TF Model Garden
- Conversion scripts that convert TF Model Garden weights to Hugging Face Transformers-compatible models
- Supported architectures include BERT, BERT with Token Dropping and TEAMS

I also released BERT-based models pretrained on the great Hugging Face FineWeb and FineWeb-Edu datasets (10BT subset). With more to come!

👉 Model Hub Link: https://huggingface.co./model-garden-lms

If you find these resources useful, please give them a like!

Made from Bavarian Oberland with ❤️ and 🥨.