137 15 332

Joseph

Joseph717171

AI & ML interests

None yet

Recent Activity

updated a model about 13 hours ago

Joseph717171/Hermes-3-Llama-3.1-8B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF

updated a model 1 day ago

Joseph717171/Models

new activity 2 days ago

Undi95/Phi4-abliterated:Awesome work, Undi95! This looks great!

View all activity

Organizations

Joseph717171's activity

updated a model about 13 hours ago

Joseph717171/Hermes-3-Llama-3.1-8B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF

Updated about 13 hours ago • 758 • 2

updated a model 1 day ago

Joseph717171/Models

Updated 1 day ago • 561 • 3

New activity in Undi95/Phi4-abliterated 2 days ago

Awesome work, Undi95! This looks great!

#1 opened 2 days ago by

Joseph717171

liked a model 2 days ago

Undi95/Phi4-abliterated

Updated 3 days ago • 118 • 6

reacted to Tonic's post with 🚀🔥 2 days ago

Post

1520

microsoft just released Phi-4 , check it out here : Tonic/Phi-4

hope you like it :-)

liked a model 3 days ago

microsoft/phi-4

Text Generation • Updated 4 days ago • 42.8k • 1.03k

upvoted a paper 3 days ago

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published 8 days ago • 72

liked 2 models 4 days ago

nomic-ai/nomic-embed-text-v1.5

nomic-ai/modernbert-embed-base

New activity in cognitivecomputations/Dolphin3.0-Llama3.1-8B 6 days ago

Great Model Base for ERP!

#1 opened 6 days ago by

Joseph717171

liked a model 6 days ago

cognitivecomputations/Dolphin3.0-Llama3.1-8B

Updated 7 days ago • 1.46k • 107

liked a model 10 days ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 6 days ago • 9.72k • 752

liked a model 18 days ago

NyxKrage/Microsoft_Phi-4

Updated 30 days ago • 7.08k • 54

upvoted a paper 18 days ago

Deliberation in Latent Space via Differentiable Cache Augmentation

Paper • 2412.17747 • Published 20 days ago • 29

liked 2 models 20 days ago

black-forest-labs/FLUX.1-schnell

Text-to-Image • Updated Aug 16, 2024 • 672k • • 3.2k

black-forest-labs/FLUX.1-dev

Text-to-Image • Updated Aug 16, 2024 • 1.25M • • 7.93k

reacted to singhsidhukuldeep's post with 🚀🧠 21 days ago

Post

3631

Exciting breakthrough in AI: @Meta 's new Byte Latent Transformer (BLT) revolutionizes language models by eliminating tokenization!

The BLT architecture introduces a groundbreaking approach that processes raw bytes instead of tokens, achieving state-of-the-art performance while being more efficient and robust. Here's what makes it special:

>> Key Innovations
Dynamic Patching: BLT groups bytes into variable-sized patches based on entropy, allocating more compute power where the data is more complex. This results in up to 50% fewer FLOPs during inference compared to traditional token-based models.

Three-Component Architecture:
• Lightweight Local Encoder that converts bytes to patch representations
• Powerful Global Latent Transformer that processes patches
• Local Decoder that converts patches back to bytes

>> Technical Advantages
• Matches performance of Llama 3 at 8B parameters while being more efficient
• Superior handling of non-English languages and rare character sequences
• Remarkable 99.9% accuracy on spelling tasks
• Better scaling properties than token-based models

>> Under the Hood
The system uses an entropy model to determine patch boundaries, cross-attention mechanisms for information flow, and hash n-gram embeddings for improved representation. The architecture allows simultaneous scaling of both patch and model size while maintaining fixed inference costs.

This is a game-changer for multilingual AI and could reshape how we build future language models. Excited to see how this technology evolves!

2 replies

reacted to davanstrien's post with 🔥 22 days ago

Post

1761

Introducing FineWeb-C 🌐🎓, a community-built dataset for improving language models in ALL languages.

Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.

318 annotators, 32K+ annotations, 12 languages - and growing! 🌍

data-is-better-together/fineweb-c