PatchTST
개요
The PatchTST 모델은 Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam이 제안한 시계열 하나가 64개의 단어만큼 가치있다: 트랜스포머를 이용한 장기예측라는 논문에서 소개되었습니다.
이 모델은 고수준에서 시계열을 주어진 크기의 패치로 벡터화하고, 결과로 나온 벡터 시퀀스를 트랜스포머를 통해 인코딩한 다음 적절한 헤드를 통해 예측 길이의 예측을 출력합니다. 모델은 다음 그림과 같이 도식화됩니다:
해당 논문의 초록입니다:
*우리는 다변량 시계열 예측과 자기 감독 표현 학습을 위한 효율적인 트랜스포머 기반 모델 설계를 제안합니다. 이는 두 가지 주요 구성 요소를 기반으로 합니다:
(i) 시계열을 하위 시리즈 수준의 패치로 분할하여 트랜스포머의 입력 토큰으로 사용 (ii) 각 채널이 모든 시리즈에 걸쳐 동일한 임베딩과 트랜스포머 가중치를 공유하는 단일 단변량 시계열을 포함하는 채널 독립성. 패칭 설계는 자연스럽게 세 가지 이점을 가집니다:
- 지역적 의미 정보가 임베딩에 유지됩니다;
- 동일한 룩백 윈도우에 대해 어텐션 맵의 계산과 메모리 사용량이 제곱으로 감소합니다
- 모델이 더 긴 과거를 참조할 수 있습니다. 우리의 채널 독립적 패치 시계열 트랜스포머(PatchTST)는 최신 트랜스포머 기반 모델들과 비교했을 때 장기 예측 정확도를 크게 향상시킬 수 있습니다. 또한 모델을 자기지도 사전 훈련 작업에 적용하여, 대규모 데이터셋에 대한 지도 학습을 능가하는 아주 뛰어난 미세 조정 성능을 달성했습니다. 한 데이터셋에서 마스크된 사전 훈련 표현을 다른 데이터셋으로 전이하는 것도 최고 수준의 예측 정확도(SOTA)를 산출했습니다.*
이 모델은 namctin, gsinthong, diepi, vijaye12, wmgifford, kashif에 의해 기여 되었습니다. 원본코드는 이곳에서 확인할 수 있습니다.
사용 팁
이 모델은 시계열 분류와 시계열 회귀에도 사용될 수 있습니다. 각각 PatchTSTForClassification와 PatchTSTForRegression 클래스를 참조하세요.
자료
- PatchTST를 자세히 설명하는 블로그 포스트는 이곳에서 찾을 수 있습니다. 이 블로그는 Google Colab에서도 열어볼 수 있습니다.
PatchTSTConfig
class transformers.PatchTSTConfig
< source >( num_input_channels: int = 1 context_length: int = 32 distribution_output: str = 'student_t' loss: str = 'mse' patch_length: int = 1 patch_stride: int = 1 num_hidden_layers: int = 3 d_model: int = 128 num_attention_heads: int = 4 share_embedding: bool = True channel_attention: bool = False ffn_dim: int = 512 norm_type: str = 'batchnorm' norm_eps: float = 1e-05 attention_dropout: float = 0.0 positional_dropout: float = 0.0 path_dropout: float = 0.0 ff_dropout: float = 0.0 bias: bool = True activation_function: str = 'gelu' pre_norm: bool = True positional_encoding_type: str = 'sincos' use_cls_token: bool = False init_std: float = 0.02 share_projection: bool = True scaling: Union = 'std' do_mask_input: Optional = None mask_type: str = 'random' random_mask_ratio: float = 0.5 num_forecast_mask_patches: Union = [2] channel_consistent_masking: Optional = False unmasked_channel_indices: Optional = None mask_value: int = 0 pooling_type: str = 'mean' head_dropout: float = 0.0 prediction_length: int = 24 num_targets: int = 1 output_range: Optional = None num_parallel_samples: int = 100 **kwargs )
Parameters
- num_input_channels (
int
, optional, defaults to 1) — The size of the target variable which by default is 1 for univariate targets. Would be > 1 in case of multivariate targets. - context_length (
int
, optional, defaults to 32) — The context length of the input sequence. - distribution_output (
str
, optional, defaults to"student_t"
) — The distribution emission head for the model when loss is “nll”. Could be either “student_t”, “normal” or “negative_binomial”. - loss (
str
, optional, defaults to"mse"
) — The loss function for the model corresponding to thedistribution_output
head. For parametric distributions it is the negative log likelihood (“nll”) and for point estimates it is the mean squared error “mse”. - patch_length (
int
, optional, defaults to 1) — Define the patch length of the patchification process. - patch_stride (
int
, optional, defaults to 1) — Define the stride of the patchification process. - num_hidden_layers (
int
, optional, defaults to 3) — Number of hidden layers. - d_model (
int
, optional, defaults to 128) — Dimensionality of the transformer layers. - num_attention_heads (
int
, optional, defaults to 4) — Number of attention heads for each attention layer in the Transformer encoder. - share_embedding (
bool
, optional, defaults toTrue
) — Sharing the input embedding across all channels. - channel_attention (
bool
, optional, defaults toFalse
) — Activate channel attention block in the Transformer to allow channels to attend each other. - ffn_dim (
int
, optional, defaults to 512) — Dimension of the “intermediate” (often named feed-forward) layer in the Transformer encoder. - norm_type (
str
, optional, defaults to"batchnorm"
) — Normalization at each Transformer layer. Can be"batchnorm"
or"layernorm"
. - norm_eps (
float
, optional, defaults to 1e-05) — A value added to the denominator for numerical stability of normalization. - attention_dropout (
float
, optional, defaults to 0.0) — The dropout probability for the attention probabilities. - positional_dropout (
float
, optional, defaults to 0.0) — The dropout probability in the positional embedding layer. - path_dropout (
float
, optional, defaults to 0.0) — The dropout path in the residual block. - ff_dropout (
float
, optional, defaults to 0.0) — The dropout probability used between the two layers of the feed-forward networks. - bias (
bool
, optional, defaults toTrue
) — Whether to add bias in the feed-forward networks. - activation_function (
str
, optional, defaults to"gelu"
) — The non-linear activation function (string) in the Transformer."gelu"
and"relu"
are supported. - pre_norm (
bool
, optional, defaults toTrue
) — Normalization is applied before self-attention if pre_norm is set toTrue
. Otherwise, normalization is applied after residual block. - positional_encoding_type (
str
, optional, defaults to"sincos"
) — Positional encodings. Options"random"
and"sincos"
are supported. - use_cls_token (
bool
, optional, defaults toFalse
) — Whether cls token is used. - init_std (
float
, optional, defaults to 0.02) — The standard deviation of the truncated normal weight initialization distribution. - share_projection (
bool
, optional, defaults toTrue
) — Sharing the projection layer across different channels in the forecast head. - scaling (
Union
, optional, defaults to"std"
) — Whether to scale the input targets via “mean” scaler, “std” scaler or no scaler ifNone
. IfTrue
, the scaler is set to “mean”. - do_mask_input (
bool
, optional) — Apply masking during the pretraining. - mask_type (
str
, optional, defaults to"random"
) — Masking type. Only"random"
and"forecast"
are currently supported. - random_mask_ratio (
float
, optional, defaults to 0.5) — Masking ratio applied to mask the input data during random pretraining. - num_forecast_mask_patches (
int
orlist
, optional, defaults to[2]
) — Number of patches to be masked at the end of each batch sample. If it is an integer, all the samples in the batch will have the same number of masked patches. If it is a list, samples in the batch will be randomly masked by numbers defined in the list. This argument is only used for forecast pretraining. - channel_consistent_masking (
bool
, optional, defaults toFalse
) — If channel consistent masking is True, all the channels will have the same masking pattern. - unmasked_channel_indices (
list
, optional) — Indices of channels that are not masked during pretraining. Values in the list are number between 1 andnum_input_channels
- mask_value (
int
, optional, defaults to 0) — Values in the masked patches will be filled bymask_value
. - pooling_type (
str
, optional, defaults to"mean"
) — Pooling of the embedding."mean"
,"max"
andNone
are supported. - head_dropout (
float
, optional, defaults to 0.0) — The dropout probability for head. - prediction_length (
int
, optional, defaults to 24) — The prediction horizon that the model will output. - num_targets (
int
, optional, defaults to 1) — Number of targets for regression and classification tasks. For classification, it is the number of classes. - output_range (
list
, optional) — Output range for regression task. The range of output values can be set to enforce the model to produce values within a range. - num_parallel_samples (
int
, optional, defaults to 100) — The number of samples is generated in parallel for probabilistic prediction.
This is the configuration class to store the configuration of an PatchTSTModel. It is used to instantiate an PatchTST model according to the specified arguments, defining the model architecture. ibm/patchtst architecture.
Configuration objects inherit from PretrainedConfig can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.
>>> from transformers import PatchTSTConfig, PatchTSTModel
>>> # Initializing an PatchTST configuration with 12 time steps for prediction
>>> configuration = PatchTSTConfig(prediction_length=12)
>>> # Randomly initializing a model (with random weights) from the configuration
>>> model = PatchTSTModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
PatchTSTModel
class transformers.PatchTSTModel
< source >( config: PatchTSTConfig )
Parameters
- config (PatchTSTConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
The bare PatchTST Model outputting raw hidden-states without any specific head. This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)
This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
forward
< source >( past_values: Tensor past_observed_mask: Optional = None future_values: Optional = None output_hidden_states: Optional = None output_attentions: Optional = None return_dict: Optional = None )
Parameters
- past_values (
torch.Tensor
of shape(bs, sequence_length, num_input_channels)
, required) — Input sequence to the model - past_observed_mask (
torch.BoolTensor
of shape(batch_size, sequence_length, num_input_channels)
, optional) — Boolean mask to indicate whichpast_values
were observed and which were missing. Mask values selected in[0, 1]
:- 1 for values that are observed,
- 0 for values that are missing (i.e. NaNs that were replaced by zeros).
- future_values (
torch.BoolTensor
of shape(batch_size, prediction_length, num_input_channels)
, optional) — Future target values associated with thepast_values
- output_hidden_states (
bool
, optional) — Whether or not to return the hidden states of all layers - output_attentions (
bool
, optional) — Whether or not to return the output attention of all layers - return_dict (
bool
, optional) — Whether or not to return aModelOutput
instead of a plain tuple.
Examples:
>>> from huggingface_hub import hf_hub_download
>>> import torch
>>> from transformers import PatchTSTModel
>>> file = hf_hub_download(
... repo_id="hf-internal-testing/etth1-hourly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)
>>> model = PatchTSTModel.from_pretrained("namctin/patchtst_etth1_pretrain")
>>> # during training, one provides both past and future values
>>> outputs = model(
... past_values=batch["past_values"],
... future_values=batch["future_values"],
... )
>>> last_hidden_state = outputs.last_hidden_state
PatchTSTForPrediction
class transformers.PatchTSTForPrediction
< source >( config: PatchTSTConfig )
Parameters
- config (PatchTSTConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
The PatchTST for prediction model. This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)
This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
forward
< source >( past_values: Tensor past_observed_mask: Optional = None future_values: Optional = None output_hidden_states: Optional = None output_attentions: Optional = None return_dict: Optional = None )
Parameters
- past_values (
torch.Tensor
of shape(bs, sequence_length, num_input_channels)
, required) — Input sequence to the model - past_observed_mask (
torch.BoolTensor
of shape(batch_size, sequence_length, num_input_channels)
, optional) — Boolean mask to indicate whichpast_values
were observed and which were missing. Mask values selected in[0, 1]
:- 1 for values that are observed,
- 0 for values that are missing (i.e. NaNs that were replaced by zeros).
- future_values (
torch.Tensor
of shape(bs, forecast_len, num_input_channels)
, optional) — Future target values associated with thepast_values
- output_hidden_states (
bool
, optional) — Whether or not to return the hidden states of all layers - output_attentions (
bool
, optional) — Whether or not to return the output attention of all layers - return_dict (
bool
, optional) — Whether or not to return aModelOutput
instead of a plain tuple.
Examples:
>>> from huggingface_hub import hf_hub_download
>>> import torch
>>> from transformers import PatchTSTConfig, PatchTSTForPrediction
>>> file = hf_hub_download(
... repo_id="hf-internal-testing/etth1-hourly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)
>>> # Prediction task with 7 input channels and prediction length is 96
>>> model = PatchTSTForPrediction.from_pretrained("namctin/patchtst_etth1_forecast")
>>> # during training, one provides both past and future values
>>> outputs = model(
... past_values=batch["past_values"],
... future_values=batch["future_values"],
... )
>>> loss = outputs.loss
>>> loss.backward()
>>> # during inference, one only provides past values, the model outputs future values
>>> outputs = model(past_values=batch["past_values"])
>>> prediction_outputs = outputs.prediction_outputs
PatchTSTForClassification
class transformers.PatchTSTForClassification
< source >( config: PatchTSTConfig )
Parameters
- config (PatchTSTConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
The PatchTST for classification model. This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)
This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
forward
< source >( past_values: Tensor target_values: Tensor = None past_observed_mask: Optional = None output_hidden_states: Optional = None output_attentions: Optional = None return_dict: Optional = None )
Parameters
- past_values (
torch.Tensor
of shape(bs, sequence_length, num_input_channels)
, required) — Input sequence to the model - target_values (
torch.Tensor
, optional) — Labels associates with thepast_values
- past_observed_mask (
torch.BoolTensor
of shape(batch_size, sequence_length, num_input_channels)
, optional) — Boolean mask to indicate whichpast_values
were observed and which were missing. Mask values selected in[0, 1]
:- 1 for values that are observed,
- 0 for values that are missing (i.e. NaNs that were replaced by zeros).
- output_hidden_states (
bool
, optional) — Whether or not to return the hidden states of all layers - output_attentions (
bool
, optional) — Whether or not to return the output attention of all layers - return_dict (
bool
, optional) — Whether or not to return aModelOutput
instead of a plain tuple.
Examples:
>>> from transformers import PatchTSTConfig, PatchTSTForClassification
>>> # classification task with two input channel2 and 3 classes
>>> config = PatchTSTConfig(
... num_input_channels=2,
... num_targets=3,
... context_length=512,
... patch_length=12,
... stride=12,
... use_cls_token=True,
... )
>>> model = PatchTSTForClassification(config=config)
>>> # during inference, one only provides past values
>>> past_values = torch.randn(20, 512, 2)
>>> outputs = model(past_values=past_values)
>>> labels = outputs.prediction_logits
PatchTSTForPretraining
class transformers.PatchTSTForPretraining
< source >( config: PatchTSTConfig )
Parameters
- config (PatchTSTConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
The PatchTST for pretrain model. This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)
This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
forward
< source >( past_values: Tensor past_observed_mask: Optional = None output_hidden_states: Optional = None output_attentions: Optional = None return_dict: Optional = None )
Parameters
- past_values (
torch.Tensor
of shape(bs, sequence_length, num_input_channels)
, required) — Input sequence to the model - past_observed_mask (
torch.BoolTensor
of shape(batch_size, sequence_length, num_input_channels)
, optional) — Boolean mask to indicate whichpast_values
were observed and which were missing. Mask values selected in[0, 1]
:- 1 for values that are observed,
- 0 for values that are missing (i.e. NaNs that were replaced by zeros).
- output_hidden_states (
bool
, optional) — Whether or not to return the hidden states of all layers - output_attentions (
bool
, optional) — Whether or not to return the output attention of all layers - return_dict (
bool
, optional) — Whether or not to return aModelOutput
instead of a plain tuple.
Examples:
>>> from huggingface_hub import hf_hub_download
>>> import torch
>>> from transformers import PatchTSTConfig, PatchTSTForPretraining
>>> file = hf_hub_download(
... repo_id="hf-internal-testing/etth1-hourly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)
>>> # Config for random mask pretraining
>>> config = PatchTSTConfig(
... num_input_channels=7,
... context_length=512,
... patch_length=12,
... stride=12,
... mask_type='random',
... random_mask_ratio=0.4,
... use_cls_token=True,
... )
>>> # Config for forecast mask pretraining
>>> config = PatchTSTConfig(
... num_input_channels=7,
... context_length=512,
... patch_length=12,
... stride=12,
... mask_type='forecast',
... num_forecast_mask_patches=5,
... use_cls_token=True,
... )
>>> model = PatchTSTForPretraining(config)
>>> # during training, one provides both past and future values
>>> outputs = model(past_values=batch["past_values"])
>>> loss = outputs.loss
>>> loss.backward()
PatchTSTForRegression
class transformers.PatchTSTForRegression
< source >( config: PatchTSTConfig )
Parameters
- config (PatchTSTConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
The PatchTST for regression model. This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)
This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
forward
< source >( past_values: Tensor target_values: Tensor = None past_observed_mask: Optional = None output_hidden_states: Optional = None output_attentions: Optional = None return_dict: Optional = None )
Parameters
- past_values (
torch.Tensor
of shape(bs, sequence_length, num_input_channels)
, required) — Input sequence to the model - target_values (
torch.Tensor
of shape(bs, num_input_channels)
) — Target values associates with thepast_values
- past_observed_mask (
torch.BoolTensor
of shape(batch_size, sequence_length, num_input_channels)
, optional) — Boolean mask to indicate whichpast_values
were observed and which were missing. Mask values selected in[0, 1]
:- 1 for values that are observed,
- 0 for values that are missing (i.e. NaNs that were replaced by zeros).
- output_hidden_states (
bool
, optional) — Whether or not to return the hidden states of all layers - output_attentions (
bool
, optional) — Whether or not to return the output attention of all layers - return_dict (
bool
, optional) — Whether or not to return aModelOutput
instead of a plain tuple.
Examples:
>>> from transformers import PatchTSTConfig, PatchTSTForRegression
>>> # Regression task with 6 input channels and regress 2 targets
>>> model = PatchTSTForRegression.from_pretrained("namctin/patchtst_etth1_regression")
>>> # during inference, one only provides past values, the model outputs future values
>>> past_values = torch.randn(20, 512, 6)
>>> outputs = model(past_values=past_values)
>>> regression_outputs = outputs.regression_outputs