大模型学习笔记:attention 机制
This self-attention process is at the core of what makes transformers so powerful. They allow every word (or token) to dynamically adjust its importance based on the surrounding context, leading to a more accurate and nuanced understanding as the model processes multiple layers of the network.
-
Must-Read Starter Guide to Mastering Attention Mechanisms in Machine Learning
- Soft Attention
- Hard Attention
- Self-Attention
- Global Attention