Fix: Rename to Multi-Head Latent Attention

Files changed (2) hide show

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ license: mit
 # DeepSeek Multi-Latent Attention
-This repository provides a PyTorch implementation of the Multi-Latent Attention (MLA) mechanism introduced in the DeepSeek-V2 paper. **This is not a trained model, but rather a modular attention implementation** that significantly reduces KV cache for efficient inference while maintaining model performance through its innovative architecture. It can be used as a drop-in attention module in transformer architectures.
 ## Key Features
@@ -33,10 +33,10 @@ Or download directly from the HuggingFace repository page.
 ```python
 import torch
-from src.mla import MultiLatentAttention
 # Initialize MLA
-mla = MultiLatentAttention(
     d_model=512,      # Model dimension
     num_head=8,       # Number of attention heads
     d_embed=512,      # Embedding dimension

 # DeepSeek Multi-Latent Attention
+This repository provides a PyTorch implementation of the Multi-Head Latent Attention (MLA) mechanism introduced in the DeepSeek-V2 paper. **This is not a trained model, but rather a modular attention implementation** that significantly reduces KV cache for efficient inference while maintaining model performance through its innovative architecture. It can be used as a drop-in attention module in transformer architectures.
 ## Key Features
 ```python
 import torch
+from src.mla import MultiHeadLatentAttention
 # Initialize MLA
+mla = MultiHeadLatentAttention(
     d_model=512,      # Model dimension
     num_head=8,       # Number of attention heads
     d_embed=512,      # Embedding dimension

insights/architecture.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# Advanced Insights: Multi-Latent Attention Architecture
 ## Key Architectural Innovations


1	+ # Advanced Insights: Multi-Head Latent Attention Architecture
2
3	## Key Architectural Innovations
4