DataLoader
Note: We are using
complaints.csv
as an example here, with the following columns:date
,complaints
. Thedate
column is used as the time index andcomplaints
as the target variable.
The DataLoader
class provides a unified interface for loading and managing time series data from CSV files in your Python projects. It is designed as the first step in your time series analysis workflow, ensuring your time series data is loaded, validated, and ready for further analysis.
Features
- Loads time series data from CSV files into pandas DataFrames.
- Standardizes column names to lowercase.
- Checks if the time series index is regular (uniform intervals).
- Saves metadata (columns, dtypes, shape, index name) to a JSON file.
Class: DataLoader
Initialization
DataLoader(filepath: str, index_col: Optional[Union[str, int]] = None, parse_dates: Union[bool, list] = True)
- filepath: Path to the CSV file containing your time series data.
- index_col: Name or position of the column to use as the time index (e.g., a timestamp or date column).
- parse_dates: Whether to parse dates in the index column.
Note: Your CSV must contain a column representing the time axis (e.g., "date"). Set
index_col
to this column's name.
Methods
load() -> pd.DataFrame
Loads the time series data from the specified CSV file, standardizes column names, and sets the index name to lowercase.
Standalone Example:
from dynamicts.data_loader import DataLoader
loader = DataLoader(filepath="data/complaints.csv", index_col="date")
df = loader.load()
print(df.head())
is_regular() -> bool
Checks if the time series index is regular (i.e., intervals between timestamps are uniform). Returns True
if regular, False
otherwise.
Standalone Example:
from dynamicts.data_loader import DataLoader
loader = DataLoader(filepath="data/complaints.csv", index_col="date")
loader.load() # Must load data first
is_reg = loader.is_regular()
print("Is regular:", is_reg)
save_metadata() -> None
Saves metadata (columns, dtypes, shape, index name) of the loaded DataFrame to a JSON file in the metadata/
directory.
Standalone Example:
from dynamicts.data_loader import DataLoader
loader = DataLoader(filepath="data/complaints.csv", index_col="date")
loader.load() # Must load data first
loader.save_metadata()
print("Metadata saved.")
run_pipeline() -> Optional[pd.DataFrame]
Runs the time series data loading pipeline:
- Loads the data.
- Checks for regularity.
- Saves metadata if data is regular.
- Returns the loaded DataFrame.
Standalone Example:
from dynamicts.data_loader import DataLoader
loader = DataLoader(filepath="data/complaints.csv", index_col="date")
data = loader.run_pipeline()
if data is not None:
print("Time series data loaded successfully!")
Notes
- The loader logs all actions and errors to a log file in the
logs/
directory. - If the time index is not regular, a warning is logged and the data is still returned for inspection.
- Metadata is saved only if the time series data is regular.