Tide is a tool for developing data processing pipelines and visualizing time series data, particularly suited for physical measurements. Key features include:
-
Efficient Data Management
- Organize and select data using a tagging system
-
Pipeline Construction
- Store and retrieve pipelines easily with JSON-based dictionary structures
- Build dynamic pipelines that adjust based on the selected data
-
Interactive Visualization
- Create interactive plots to explore data (plotly)
- Visualize pipeline or slices of pipelines effects on data
-
Custom Data Enrichment
- Integrate external weather data sources
- Implement autoregressive models for gaps filling
- Develop and incorporate custom data processors
Uses pandas DataFrames and Series for robust data handling. bigtree for tags and data selection. Scikit-learn's API for pipeline construction.
pip install python-tide
To begin, load your time series data into a pandas DataFrame, ensuring the index is a DateTimeIndex:
df = pd.read_csv(
"https://raw.githubusercontent.com/BuildingEnergySimulationTools/tide/main/tutorials/getting_started_ts.csv",
parse_dates=True,
index_col=0
)
Rename columns using Tide's tagging system.
The format is:
name__unit__bloc__sub_bloc
with tags separated by double underscores.
The order of the tags matters.
The order of tags is important, and you can use "OTHER" as a placeholder
You can use one or several tags.
df.columns = ["Tin__°C__Building", "Text__°C__Outdoor", "Heat__W__Building"]
Plumber objects are used to help us with pipelines building and data visualization
from tide.plumbing import Plumber
plumber = Plumber(df)
Display the data organization as a tree:
plumber.show()
Select data using tags:
plumber.get_corrected_data("°C")
plumber.get_corrected_data("Building")
plumber.get_corrected_data("Tin")
Show data availability:
plumber.plot_gaps_heatmap(time_step='d')
Plot time series with missing data highlighted:
fig = plumber.plot(plot_gaps=True)
fig.show(None)
Create a pipeline dictionary:
pipe_dict = {
"step_1": [["Common_proc_1"], ["Common_proc_2", ["arg1", "arg2"]]],
"step_2": {
"selection_1": [["Proc_selection_1", {"arg": "arg_value"}]]
}
}
Pipeline Rules:
- Use dictionaries for pipeline description
- Keys represent pipeline steps ex.
"step_1"
- Step values can be lists (apply to all columns) or dicts (filter columns)
- Processing objects are listed as [class_name, arguments]
Example Pipeline:
- Resample data to 15-minute intervals
- Interpolate temperature gaps ≤ 3 hours
- Fill large Tin gaps using Autoregressive STLForecast
pipe_dict = {
"resample_15min": [["Resample", ["15min"]]],
"interpolate_temps": {
"°C": [["Interpolate", {"gaps_lte": "3h"}]]
},
"ar_tin": {
"Tin": [
[
"FillGapsAR",
{
"model_name": "Prophet",
"resample_at_td": "1h",
"gaps_gte": "3h",
"gaps_lte": "3d"
}
]
]
}
}
plumber.pipe_dict = pipe_dict
Get pipeline using get_pipeline
method.
plumber.get_pipeline(verbose=True)
Get pipelines for specific columns
plumber.get_pipeline(select="Building", verbose=True)
Visualize pipeline effects:
plumber.plot(
steps=None,
plot_gaps=True,
steps_2=slice(None, "interpolate_temps"),
plot_gaps_2=True,
verbose=True
)
Step Arguments:
None
: No operation (Identity)str
: Process until named steplist[str]
: Perform specified stepsslice
: Process a slice of the pipeline
Compare full pipeline to raw data:
plumber.plot(
steps=None,
plot_gaps=True,
steps_2=slice(None),
plot_gaps_2=True,
verbose=True
)