大模型学习笔记:attention 机制

This self-attention process is at the core of what makes transformers so powerful. They allow every word (or token) to dynamically adjust its importance based on the surrounding context, leading to a more accurate and nuanced understanding as the model processes multiple layers of the network.

posted @ 2024-11-24 11:54  dudu  阅读(12)  评论(0编辑  收藏  举报