File size: 6,491 Bytes
7a371b8 7fe295d 1d8ee70 7fe295d 7a371b8 b995fed 7a371b8 b995fed 7a371b8 6b9d616 b995fed 7a371b8 6b9d616 7a371b8 b995fed 7a371b8 6b9d616 b995fed 7a371b8 b995fed 7a371b8 b995fed 7a371b8 b995fed 7a371b8 b995fed 7a371b8 b995fed |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
---
tags:
- generated_from_trainer
- time series
- forecasting
- pretrained models
- foundation models
- time series foundation models
- time-series
license: apache-2.0
pipeline_tag: time-series-forecasting
model-index:
- name: patchtst_etth1_forecast
results: []
---
# PatchTST model pre-trained on ETTh1 dataset
<!-- Provide a quick summary of what the model is/does. -->
[`PatchTST`](https://huggingface.co/docs/transformers/model_doc/patchtst) is a transformer-based model for time series modeling tasks, including forecasting, regression, and classification. This repository contains a pre-trained `PatchTST` model encompassing all seven channels of the `ETTh1` dataset.
This particular pre-trained model produces a Mean Squared Error (MSE) of 0.3881 on the `test` split of the `ETTh1` dataset when forecasting 96 hours into the future with a historical data window of 512 hours.
For training and evaluating a `PatchTST` model, you can refer to this [demo notebook](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/patch_tst_getting_started.ipynb).
## Model Details
### Model Description
The `PatchTST` model was proposed in A Time Series is Worth [64 Words: Long-term Forecasting with Transformers](https://arxiv.org/abs/2211.14730) by Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam.
At a high level the model vectorizes time series into patches of a given size and encodes the resulting sequence of vectors via a Transformer that then outputs the prediction length forecast via an appropriate head.
The model is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer; (ii) channel-independence where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series. The patching design naturally has three-fold benefit: local semantic information is retained in the embedding; computation and memory usage of the attention maps are quadratically reduced given the same look-back window; and the model can attend longer history. Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models.
In addition, PatchTST has a modular design to seamlessly support masked time series pre-training as well as direct time series forecasting, classification, and regression.
<img src="patchtst_architecture.png" alt="Architecture" width="600" />
### Model Sources
<!-- Provide the basic links for the model. -->
- **Repository:** [PatchTST Hugging Face](https://huggingface.co/docs/transformers/model_doc/patchtst)
- **Paper:** [PatchTST ICLR 2023 paper](https://dl.acm.org/doi/abs/10.1145/3580305.3599533)
- **Demo:** [Get started with PatchTST](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/patch_tst_getting_started.ipynb)
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
This pre-trained model can be employed for fine-tuning or evaluation using any Electrical Transformer dataset that has the same channels as the `ETTh1` dataset, specifically: `HUFL, HULL, MUFL, MULL, LUFL, LULL, OT`. The model is designed to predict the next 96 hours based on the input values from the preceding 512 hours. It is crucial to normalize the data. For a more comprehensive understanding of data pre-processing, please consult the paper or the demo.
## How to Get Started with the Model
Use the code below to get started with the model.
[Demo](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/patch_tst_getting_started.ipynb)
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
[`ETTh1`/train split](https://github.com/zhouhaoyi/ETDataset/blob/main/ETT-small/ETTh1.csv).
Train/validation/test splits are shown in the [demo](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/patch_tst_getting_started.ipynb).
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
### Training Results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 0.4306 | 1.0 | 1005 | 0.7268 |
| 0.3641 | 2.0 | 2010 | 0.7456 |
| 0.348 | 3.0 | 3015 | 0.7161 |
| 0.3379 | 4.0 | 4020 | 0.7428 |
| 0.3284 | 5.0 | 5025 | 0.7681 |
| 0.321 | 6.0 | 6030 | 0.7842 |
| 0.314 | 7.0 | 7035 | 0.7991 |
| 0.3088 | 8.0 | 8040 | 0.8021 |
| 0.3053 | 9.0 | 9045 | 0.8199 |
| 0.3019 | 10.0 | 10050 | 0.8173 |
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data
[`ETTh1`/test split](https://github.com/zhouhaoyi/ETDataset/blob/main/ETT-small/ETTh1.csv).
Train/validation/test splits are shown in the [demo](https://github.com/IBM/tsfm/blob/main/notebooks/hfdemo/patch_tst_getting_started.ipynb).
### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
Mean Squared Error (MSE).
### Results
It achieves a MSE of 0.3881 on the evaluation dataset.
#### Hardware
1 NVIDIA A100 GPU
#### Framework versions
- Transformers 4.36.0.dev0
- Pytorch 2.0.1
- Datasets 2.14.4
- Tokenizers 0.14.1
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
```
@misc{nie2023time,
title={A Time Series is Worth 64 Words: Long-term Forecasting with Transformers},
author={Yuqi Nie and Nam H. Nguyen and Phanwadee Sinthong and Jayant Kalagnanam},
year={2023},
eprint={2211.14730},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
**APA:**
```
Nie, Y., Nguyen, N., Sinthong, P., & Kalagnanam, J. (2023). A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. arXiv preprint arXiv:2211.14730.
``` |