Attention mechanism
A potential issue with this encoder–decoder approach is that a neural network needs to be able to compress all the necessary information of a source sentence into a fixed-length vector. This may make it difficult for the neural network to cope with long sentences, especially those that are longer than the sentences in the training corpus.
A limitation of the architecture is that it encodes the input sequence to a fixed length internal representation. This imposes limits on the length of input sequences that can be reasonably learned and results in worse performance for very long input sequences. is this also true for time series? after all the time series data are split into fixed length, it doesn't like text with variety length. 如果每个时间序列都有滑动窗口分成固定长度的数据输入到网络中, 那么fix-length internal representation还有缺陷吗? 衡量到底有没有缺陷,1. 元代码,结果准确度,with labels, VS 2. attention code, accuracy, with labels. : 这个fix-length internal representation会对time-series运行趋势有区别吗? 模拟数据进行测试。
Attention is the idea of freeing the encoder-decoder architecture from the fixed-length internal representation.
What is self-attention?
原理理解:
首先encoder(LSTM或GRU)处理长度为ei的序列,生成长度为ei*hidden_state的输出,这里不再赘述,同时encoder最后时间步输出的隐含层会作为初始输入传入decoder。
if the attention mechanism is true, can you visualize the heat map?
References:
-
How to Develop an Encoder-Decoder Model with Attention in Keras Jason Brownlee
-
Attention in Long Short-Term Memory Recurrent Neural Networks
- What is exactly the attention mechanism introduced to RNN? On Quora, easy to understand
- DATASET: https://github.com/numenta/NAB/tree/master/data
- Dataset: https://www.kaggle.com/data/42760
- An example about how to use NAB dataset. https://www.awsadvent.com/2018/12/17/time-series-anomaly-detection-with-lstm-and-mxnet/