File size: 3,341 Bytes
f5407b6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# Correlation
> **Note:** We are using `complaints.csv` as an example here, with the following columns: `date`, `complaints`. The `date` column is used as the time index and `complaints` as the target variable.
The `Correlation` class provides methods for analyzing and visualizing autocorrelation (ACF) and partial autocorrelation (PACF) in univariate time series data. These tools are essential for understanding lag relationships and dependencies in time series analysis.
## Features
- Plots autocorrelation (ACF) for a given time series and number of lags.
- Plots partial autocorrelation (PACF) for a given time series and number of lags.
- Supports both instance-based and standalone usage.
- Optionally logs plots to HTML reports.
## Class: `Correlation`
### Initialization
```python
Correlation(df: pd.DataFrame = None, target_col: str = None, lags: int = 20, output_filepath: str = None)
```
- **df**: The time series DataFrame (indexed by the time column).
- **target_col**: The column name of the univariate time series to analyze.
- **lags**: Number of lags to use for correlation plots (default: 20).
- **output_filepath**: Path prefix for saving HTML reports and plots.
> **Note:** Your DataFrame should have a time-based index (e.g., "date", "timestamp").
### Methods
#### `acf_plot(data: pd.Series = None, lags: int = None, save: bool = True, output_filepath: str = None)`
Plots the autocorrelation function (ACF) for the specified time series and number of lags.
- **data**: Optional. A pandas Series to plot. If not provided, uses the instance's DataFrame and target column.
- **lags**: Optional. Number of lags to plot. Defaults to the instance's `lags`.
- **save**: Optional. Whether to save the plot to an HTML report.
- **output_filepath**: Optional. Path for saving the report.
**Standalone Example:**
```python
from dynamicts.lag_correlation import Correlation
# Instance-based usage
corr = Correlation(df, target_col="complaints", lags=30, output_filepath="report")
fig = corr.acf_plot()
fig.show()
# Standalone usage
corr = Correlation()
fig = corr.acf_plot(data=df["complaints"], lags=30, output_filepath="report")
fig.show()
```
#### `pacf_plot(data: pd.Series = None, lags: int = None, save: bool = True, output_filepath: str = None)`
Plots the partial autocorrelation function (PACF) for the specified time series and number of lags.
- **data**: Optional. A pandas Series to plot. If not provided, uses the instance's DataFrame and target column.
- **lags**: Optional. Number of lags to plot. Defaults to the instance's `lags`.
- **save**: Optional. Whether to save the plot to an HTML report.
- **output_filepath**: Optional. Path for saving the report.
**Standalone Example:**
```python
from dynamicts.lag_correlation import Correlation
# Instance-based usage
corr = Correlation(df, target_col="complaints", lags=30, output_filepath="report")
fig = corr.pacf_plot()
fig.show()
# Standalone usage
corr = Correlation()
fig = corr.pacf_plot(data=df["complaints"], lags=30, output_filepath="report")
fig.show()
```
### Notes
- The DataFrame should be indexed by the time column for proper time series analysis.
- Plots can be logged to HTML reports if `save=True` and `output_filepath` is provided.
---
|