Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
bird-of-paradise
/
deepseek-mla
like
3
Text Generation
Transformers
PyTorch
English
deepseek-mla
attention-mechanism
mla
efficient-attention
arxiv:
2405.04434
License:
mit
Model card
Files
Files and versions
Community
Use this model
main
deepseek-mla
2 contributors
History:
7 commits
bird-of-paradise
cross referencing other transformer-related implementations
bf7364f
1 day ago
assets
Initial commit: DeepSeek Multi-Latent Attention implementation
19 days ago
insights
Fix: Rename to Multi-Head Latent Attention
17 days ago
src
Update class names to MultiHeadLatentAttention
17 days ago
.DS_Store
Safe
6.15 kB
Initial commit: DeepSeek Multi-Latent Attention implementation
19 days ago
.gitattributes
Safe
1.52 kB
initial commit
19 days ago
CONTRIBUTING.md
Safe
0 Bytes
Initial commit: DeepSeek Multi-Latent Attention implementation
19 days ago
README.md
Safe
5.01 kB
cross referencing other transformer-related implementations
1 day ago