CQ-AI
/

dynamicts

Model card Files Files and versions

dynamicts / docs /data_loader.md

Captain-Slow's picture

Upload 23 files

f5407b6 verified 8 days ago

|

history blame contribute delete

3.29 kB

	# DataLoader

	> Note: We are using `complaints.csv` as an example here, with the following columns: `date`, `complaints`. The `date` column is used as the time index and `complaints` as the target variable.

	The `DataLoader` class provides a unified interface for loading and managing time series data from CSV files in your Python projects. It is designed as the first step in your time series analysis workflow, ensuring your time series data is loaded, validated, and ready for further analysis.

	## Features

	- Loads time series data from CSV files into pandas DataFrames.
	- Standardizes column names to lowercase.
	- Checks if the time series index is regular (uniform intervals).
	- Saves metadata (columns, dtypes, shape, index name) to a JSON file.

	## Class: `DataLoader`

	### Initialization

	```python
	DataLoader(filepath: str, index_col: Optional[Union[str, int]] = None, parse_dates: Union[bool, list] = True)
	```

	- filepath: Path to the CSV file containing your time series data.
	- index_col: Name or position of the column to use as the time index (e.g., a timestamp or date column).
	- parse_dates: Whether to parse dates in the index column.

	> Note: Your CSV must contain a column representing the time axis (e.g., "date"). Set `index_col` to this column's name.

	### Methods

	#### `load() -> pd.DataFrame`

	Loads the time series data from the specified CSV file, standardizes column names, and sets the index name to lowercase.

	Standalone Example:
	```python
	from dynamicts.data_loader import DataLoader

	loader = DataLoader(filepath="data/complaints.csv", index_col="date")
	df = loader.load()
	print(df.head())
	```

	#### `is_regular() -> bool`

	Checks if the time series index is regular (i.e., intervals between timestamps are uniform). Returns `True` if regular, `False` otherwise.

	Standalone Example:
	```python
	from dynamicts.data_loader import DataLoader

	loader = DataLoader(filepath="data/complaints.csv", index_col="date")
	loader.load() # Must load data first
	is_reg = loader.is_regular()
	print("Is regular:", is_reg)
	```

	#### `save_metadata() -> None`

	Saves metadata (columns, dtypes, shape, index name) of the loaded DataFrame to a JSON file in the `metadata/` directory.

	Standalone Example:
	```python
	from dynamicts.data_loader import DataLoader

	loader = DataLoader(filepath="data/complaints.csv", index_col="date")
	loader.load() # Must load data first
	loader.save_metadata()
	print("Metadata saved.")
	```

	#### `run_pipeline() -> Optional[pd.DataFrame]`

	Runs the time series data loading pipeline:
	- Loads the data.
	- Checks for regularity.
	- Saves metadata if data is regular.
	- Returns the loaded DataFrame.

	Standalone Example:
	```python
	from dynamicts.data_loader import DataLoader

	loader = DataLoader(filepath="data/complaints.csv", index_col="date")
	data = loader.run_pipeline()
	if data is not None:
	print("Time series data loaded successfully!")
	```

	### Notes

	- The loader logs all actions and errors to a log file in the `logs/` directory.
	- If the time index is not regular, a warning is logged and the data is still returned for inspection.
	- Metadata is saved only if the time series data is regular.

	---