Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
bird-of-paradise
/
deepseek-mla
like
3
Text Generation
Transformers
PyTorch
English
deepseek-mla
attention-mechanism
mla
efficient-attention
arxiv:
2405.04434
License:
mit
Model card
Files
Files and versions
Community
Use this model
1919884
deepseek-mla
/
insights
2 contributors
History:
1 commit
Yan Wei
Initial commit: DeepSeek Multi-Latent Attention implementation
550eb56
20 days ago
architecture.md
Safe
3.36 kB
Initial commit: DeepSeek Multi-Latent Attention implementation
20 days ago
attention_mask.md
Safe
1.7 kB
Initial commit: DeepSeek Multi-Latent Attention implementation
20 days ago