File size: 3,874 Bytes
f5407b6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
# UnivariateAnalysis
> **Note:** The following examples assume a time series DataFrame similar to `complaints.csv`, with columns: `date`and `complaints`.
The `UnivariateAnalysis` class provides a suite of methods for exploratory and statistical analysis of univariate time series data. It helps you understand the distribution, missing values, and outliers in your time series before further modeling or forecasting.
## Features
- Visualizes the distribution and boxplot of the target time series.
- Computes skewness and kurtosis with interpretation.
- Checks for missing values and provides recommendations.
- Detects outliers using IQR and Z-score methods.
- Logs plots and messages to HTML reports.
## Class: `UnivariateAnalysis`
### Initialization
```python
UnivariateAnalysis(df: pd.DataFrame, target_col: str, index_col: str = "date", output_filepath: str = "output_filepath")
```
- **df**: The time series DataFrame (indexed by the time column).
- **target_col**: The column name of the univariate time series to analyze.
- **index_col**: The name of the time index column (default: "date").
- **output_filepath**: Path prefix for saving HTML reports and plots.
> **Note:** Your DataFrame should have a time-based index (e.g., "date", "timestamp").
### Methods
#### `plot_distribution()`
Plots the histogram and boxplot of the target time series column and logs the plot to the HTML report.
**Standalone Example:**
```python
from dynamicts.analysis import UnivariateAnalysis
analysis = UnivariateAnalysis(df, target_col="complaints", index_col="date", output_filepath="report")
fig = analysis.plot_distribution()
fig.show()
```
#### `check_distribution_stats()`
Computes skewness and kurtosis for the target column, interprets the results, and logs the summary to the HTML report.
**Standalone Example:**
```python
from dynamicts.analysis import UnivariateAnalysis
analysis = UnivariateAnalysis(df, target_col="complaints", index_col="date", output_filepath="report")
stats = analysis.check_distribution_stats()
print(stats["full_message"])
```
#### `check_missing_values()`
Checks for missing values in the target column, reports the count and percentage, and logs recommendations to the HTML report.
**Standalone Example:**
```python
from dynamicts.analysis import UnivariateAnalysis
analysis = UnivariateAnalysis(df, target_col="complaints", index_col="date", output_filepath="report")
missing = analysis.check_missing_values()
print(missing["message"])
```
#### `detect_outliers(method="both", plot=True)`
Detects outliers in the target column using IQR, Z-score, or both. Optionally plots and logs the results.
- **method**: "iqr", "zscore", or "both" (default: "both").
- **plot**: Whether to plot the outliers (default: True).
**Standalone Example:**
```python
from dynamicts.analysis import UnivariateAnalysis
analysis = UnivariateAnalysis(df, target_col="complaints", index_col="date", output_filepath="report")
outliers = analysis.detect_outliers(method="both", plot=True)
print(f"Outliers detected: {outliers['outliers_detected']}")
```
#### `run_univariate_analysis(df, output_filepath, target_col, index_col="date")` (static method)
Runs the full univariate analysis pipeline: distribution plot, stats, missing values, and outlier detection. Displays results in a notebook environment.
**Standalone Example:**
```python
from dynamicts.analysis import UnivariateAnalysis
results = UnivariateAnalysis.run_univariate_analysis(
df=df,
output_filepath="report",
target_col="complaints",
index_col="date"
)
```
### Notes
- All plots and messages are logged to HTML reports using the provided `output_filepath`.
- The DataFrame should be indexed by the time column for proper time series analysis.
---
|