Ersi Ni's picture

12 24

Ersi Ni

nilbot

·

nilbot

AI & ML interests

Transformers

Recent Activity

updated a collection 9 days ago

upvoted a paper 9 days ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

upvoted an article 23 days ago

G2P Shrinks Speech Models

View all activity

Organizations

None yet

nilbot's activity

updated a collection 9 days ago

towards AGI

7 items • Updated 9 days ago

upvoted a paper 9 days ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published 12 days ago • 134

upvoted an article 23 days ago

Article

G2P Shrinks Speech Models

By

•

23 days ago

• 27

liked a model 23 days ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 1 day ago • 1.29M • 3.47k

updated a collection about 1 month ago

towards AGI

7 items • Updated 9 days ago

upvoted a paper about 1 month ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 334

upvoted an article about 1 month ago

Article

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

By

•

Jan 15

• 41

upvoted an article about 2 months ago

Article

🌁#82: AI and ML in Real Life

By

•

Jan 7

• 16

updated a collection 3 months ago

Inbox

4 items • Updated Nov 25, 2024

upvoted a paper 3 months ago

UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages

Paper • 2411.14343 • Published Nov 21, 2024 • 7

liked a dataset 4 months ago

v2ray/anime-collection

Updated Nov 9, 2024 • 101 • 5

liked a model 4 months ago

mistralai/Ministral-8B-Instruct-2410

Updated Dec 6, 2024 • 45.3k • 436

liked a dataset 4 months ago

neuralwork/arxiver

Viewer • Updated Nov 1, 2024 • 63.4k • 725 • 359

liked 2 models 4 months ago

deepseek-ai/Janus-1.3B

Any-to-Any • Updated Jan 27 • 176k • 578

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

Text Generation • Updated Oct 25, 2024 • 114k • • 2.02k

replied to PLB's post 4 months ago

Interesting, but how does this approach generalize to arbitrary user query / document domains? Would you need to train a separate network for each domain / dataset?

updated a collection 5 months ago

Inbox

4 items • Updated Nov 25, 2024