Question about context_length and lags_sequence
Hello, thank you for contributing an interesting model.
I saw in another post that you plan to release a notebook, however I am not very patient :D.
If you have the time to answer, I would like to ask a question about dimensionality of the inputs. I see that the model always expects the past_input's shape to be (batch_size, context_length+max(lags_sequence)). Does this mean that I have to provide #context_length additional lag steps in my inputs?
yes I believe so... do have a look at the blog post here for further details: https://huggingface.co./blog/time-series-transformers
Hi,
The way lags are created was actually a bit confusing for me at first - let me explain.
So for instance when training on the "tourism-monthly" dataset, the frequency of the data is "monthly", and as explained in the blog, we use the default lags provided by GluonTS:
from gluonts.time_feature import get_lags_for_frequency
freq = "1M"
lags_sequence = get_lags_for_frequency(freq)
print(lags_sequence)
>>> [1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 23, 24, 25, 35, 36, 37]
This means that, for each value of a time series that we feed to the Transformer, we will add the value of the month before that, the value of 2 months before that, ... until the value of 37 months before that as additional features.
So if you have a value for a particular month, then we will add the previous 37 values as one vector to the model, in addition to the value itself. This means that you actually feed a vector of size 38 to the model. This is the embedding vector that will go through the Transformer.
The inputs that we feed to the model (past_values) may contain these lags, but internally the model will squash them into the feature dimension, to make sure the sequence length is still equal to the context length.
So let's say we use a context length of 24 months, and we use the 37 lags, then this means that the past_values we feed to the model are of shape (batch_size, context_length + max(lags_sequence)) = (batch_size, 24 + 37) = (batch_size, 61). The model will internally make sure that this tensor is turned into a tensor of shape (batch_size, sequence_length, hidden_size) = (batch_size, 24, 38) - assuming the lags are the only "features" we add besides the real values. This is the tensor that will go through the Transformer encoder.
Hope that makes it clear!