Skip to content

Commit

Permalink
Print meta data (#121)
Browse files Browse the repository at this point in the history
* Add the sensor and event resolution to the string representation of the BeliefsDataFrame

* Add the sensor and event resolution to the string representation of the BeliefsSeries

* Refactor to util function

* Refactor util function to print out all metadata

* Add docstring with example

* Update printed examples in documentation

* Update sensor name and unit of example BeliefsDataFrame

* Use str instead of __repr__

* simplify example
  • Loading branch information
Flix6x authored Nov 10, 2022
1 parent 82620a3 commit f5ed121
Show file tree
Hide file tree
Showing 7 changed files with 65 additions and 17 deletions.
23 changes: 15 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Getting started (or try one of the [other ways to create a BeliefsDataFrame](tim
event_value
event_start belief_time source cumulative_probability
2000-03-05 11:00:00+00:00 2000-03-05 11:00:00+00:00 Thermometer 0.5 21
sensor: <Sensor: Indoor temperature>, event_resolution: 0:00:00

The package contains the following functionality:

Expand Down Expand Up @@ -69,8 +70,8 @@ Together these index levels describe data points as probabilistic beliefs.
Because of the sparse representation of index levels (a clever default setting in pandas) we get clean-looking data,
as we show here in a printout of the example BeliefsDataFrame in our examples module:

>>> import timely_beliefs
>>> df = timely_beliefs.examples.get_example_df()
>>> import timely_beliefs as tb
>>> df = tb.examples.get_example_df()
>>> df.head(8)
event_value
event_start belief_time source cumulative_probability
Expand All @@ -82,20 +83,21 @@ as we show here in a printout of the example BeliefsDataFrame in our examples mo
2000-01-01 01:00:00+00:00 Source A 0.1587 99
0.5000 100
0.8413 101
sensor: <Sensor: weight>, event_resolution: 0:15:00

The first 8 entries of this BeliefsDataFrame show beliefs about a single event.
Beliefs were formed by two distinct sources (A and B), with the first updating its beliefs at a later time.
Source A first thought the value of this event would be 100 ± 10 (the probabilities suggest a normal distribution),
and then increased its accuracy by lowering the standard deviation to 1.
Source B thought the value would be equally likely to be 0 or 100.

More information about what actually constitutes an event is stored as metadata in the BeliefsDataFrame.
More information about what actually constitutes an event is stored as metadata in the BeliefsDataFrame, which is printed out just below the frame.
The sensor property keeps track of invariable information such as the unit of the data and the resolution of events.

>>> df.sensor
<Sensor: Sensor 1>
>>> df.sensor.unit
'kg'

Currently a BeliefsDataFrame contains data about a single sensor only.
Currently, a BeliefsDataFrame contains data about a single sensor only.
_For a future release we are considering adding the sensor as another index level,
to offer out-of-the-box support for aggregating over multiple sensors._

Expand Down Expand Up @@ -125,23 +127,28 @@ each with a different viewpoint.
With a rolling viewpoint, you get the accuracy of beliefs at a certain `belief_horizon` before (or after) `knowledge_time`,
for example, some days before each event ends.

>>> from datetime import timedelta
>>> df.rolling_viewpoint_accuracy(timedelta(days=2, hours=9), reference_source=df.lineage.sources[0])
mae mape wape
source
Source A 1.482075 0.014821 0.005928
Source B 125.853250 0.503413 0.503413
sensor: <Sensor: weight>, event_resolution: 0:15:00

With a fixed viewpoint, you get the accuracy of beliefs held at a certain `belief_time`.

>>> df.fixed_viewpoint_accuracy(datetime(2000, 1, 2, tzinfo=utc), reference_source=df.lineage.sources[0])
>>> from datetime import datetime
>>> import pytz
>>> df = df.fixed_viewpoint_accuracy(datetime(2000, 1, 2, tzinfo=pytz.utc), reference_source=df.lineage.sources[0])
mae mape wape
source
Source A 0.00000 0.000000 0.000000
Source B 125.85325 0.503413 0.503413
sensor: <Sensor: weight>, event_resolution: 0:15:00

For an intuitive representation of accuracy that works in many cases, we suggest to use:

>>> `df["accuracy"] = 1 - df["wape"]`
>>> df["accuracy"] = 1 - df["wape"]

[A more detailed discussion of accuracy and error metrics can be found here.](timely_beliefs/docs/accuracy.md)

Expand Down
10 changes: 9 additions & 1 deletion timely_beliefs/beliefs/classes.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
import timely_beliefs.utils as tb_utils
from timely_beliefs.beliefs import probabilistic_utils
from timely_beliefs.beliefs import utils as belief_utils
from timely_beliefs.beliefs.utils import is_pandas_structure, is_tb_structure
from timely_beliefs.beliefs.utils import is_pandas_structure, is_tb_structure, meta_repr
from timely_beliefs.db_base import Base
from timely_beliefs.sensors import utils as sensor_utils
from timely_beliefs.sensors.classes import DBSensor, Sensor, SensorDBMixin
Expand Down Expand Up @@ -663,6 +663,10 @@ def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
return

def __repr__(self):
"""Add the sensor and event resolution to the string representation of the BeliefsSeries."""
return super().__repr__() + "\n" + meta_repr(self)


class BeliefsDataFrame(pd.DataFrame):
"""Beliefs about a sensor.
Expand Down Expand Up @@ -2036,6 +2040,10 @@ def set_reference_values(
return df.convert_index_from_belief_horizon_to_time()
return pd.concat([df, reference_df], axis=1)

def __repr__(self):
"""Add the sensor and event resolution to the string representation of the BeliefsDataFrame."""
return super().__repr__() + "\n" + meta_repr(self)


def set_columns_and_indices_for_empty_frame(df, columns, indices, default_types):
"""Set appropriate columns and indices for the empty BeliefsDataFrame."""
Expand Down
19 changes: 19 additions & 0 deletions timely_beliefs/beliefs/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -776,3 +776,22 @@ def extreme_timedeltas_not_equal(
if isinstance(td_a, pd.Timedelta):
td_a = td_a.to_pytimedelta()
return td_a != td_b


def meta_repr(
tb_structure: Union["classes.BeliefsDataFrame", "classes.BeliefsSeries"]
) -> str:
"""Returns a string representation of all metadata.
For example:
>>> from timely_beliefs.examples import get_example_df
>>> df = get_example_df()
>>> meta_repr(df)
'sensor: <Sensor: weight>, event_resolution: 0:15:00'
"""
return ", ".join(
[
": ".join([attr, str(getattr(tb_structure, attr))])
for attr in tb_structure._metadata
]
)
5 changes: 5 additions & 0 deletions timely_beliefs/docs/init.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,15 @@ Required arguments:


>>> import pandas as pd
>>> import pytz
>>> s = pd.Series([63, 60], index=pd.date_range(datetime(2000, 1, 3, 9), periods=2, tz=pytz.utc))
>>> bdf = tb.BeliefsDataFrame(s, belief_horizon=timedelta(hours=0), source=source, sensor=sensor)
>>> print(bdf)
event_value
event_start belief_horizon source cumulative_probability
2000-01-03 09:00:00+00:00 0 days EPEX 0.5 63
2000-01-04 09:00:00+00:00 0 days EPEX 0.5 60
sensor: <Sensor: EPEX SPOT day-ahead price>, event_resolution: 1:00:00

## From a Pandas DataFrame

Expand All @@ -53,6 +55,7 @@ Pass a Pandas DataFrame with columns ["event_start", "belief_time", "source", "c
event_start belief_horizon source cumulative_probability
2000-01-03 09:00:00+00:00 0 days EPEX 0.5 63
2000-01-03 10:00:00+00:00 0 days EPEX 0.5 60
sensor: <Sensor: EPEX SPOT day-ahead price>, event_resolution: 1:00:00

Alternatively, a keyword argument can be used to replace a column that contains the same value for each belief.

Expand All @@ -63,6 +66,7 @@ Alternatively, a keyword argument can be used to replace a column that contains
event_start belief_horizon source cumulative_probability
2000-01-03 09:00:00+00:00 0 days EPEX 0.5 63
2000-01-03 10:00:00+00:00 0 days EPEX 0.5 60
sensor: <Sensor: EPEX SPOT day-ahead price>, event_resolution: 1:00:00

## From a CSV file

Expand Down Expand Up @@ -99,3 +103,4 @@ Create a list of `TimedBelief` or `DBTimedBelief` objects and use it to initiali
event_start belief_time source cumulative_probability
2000-01-03 09:00:00+00:00 2000-01-03 10:00:00+00:00 EPEX 0.5 63
2000-01-03 10:00:00+00:00 2000-01-03 11:00:00+00:00 EPEX 0.5 60
sensor: <Sensor: EPEX SPOT day-ahead price>, event_resolution: 1:00:00
5 changes: 4 additions & 1 deletion timely_beliefs/docs/resampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ Resampling a BeliefsDataFrame can be an expensive operation, especially when the
Upsample to events with a resolution of 5 minutes:

>>> from datetime import timedelta
>>> df = timely_beliefs.examples.get_example_df()
>>> import timely_beliefs as tb
>>> df = tb.examples.get_example_df()
>>> df5m = df.resample_events(timedelta(minutes=5))
>>> df5m.sort_index(level=["belief_time", "source"]).head(9)
event_value
Expand All @@ -28,6 +29,7 @@ Upsample to events with a resolution of 5 minutes:
2000-01-03 09:10:00+00:00 2000-01-01 00:00:00+00:00 Source A 0.1587 90.0
0.5000 100.0
0.8413 110.0
sensor: <Sensor: weight>, event_resolution: 0:05:00

When resampling, the event resolution of the underlying sensor remains the same (it's still a fixed property of the sensor):

Expand Down Expand Up @@ -66,6 +68,7 @@ Downsample to events with a resolution of 2 hours:
2000-01-03 12:00:00+00:00 2000-01-01 00:00:00+00:00 Source A 0.158700 360.0
0.500000 400.0
1.000000 440.0
sensor: <Sensor: weight>, event_resolution: 2:00:00
>>> -df2h.knowledge_horizons[0]
Timedelta('0 days 02:00:00')

Expand Down
14 changes: 9 additions & 5 deletions timely_beliefs/docs/slicing.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,10 @@ Being an extension of the pandas DataFrame, all of pandas excellent slicing meth
For example, to select all beliefs about events from 11 AM onwards:

>>> from datetime import datetime, timedelta
>>> from pytz import utc
>>> df = timely_beliefs.examples.get_example_df()
>>> df[df.index.get_level_values("event_start") >= datetime(2000, 1, 3, 11, tzinfo=utc)]
>>> import pytz
>>> import timely_beliefs as tb
>>> df = tb.examples.get_example_df()
>>> df[df.index.get_level_values("event_start") >= datetime(2000, 1, 3, 11, tzinfo=pytz.utc)]

Besides these, `timely-beliefs` provides custom methods to conveniently slice through time in different ways.

Expand All @@ -38,12 +39,13 @@ Select the latest forecasts from a rolling viewpoint (beliefs formed at least 2
1.0000 300
2000-01-03 12:00:00+00:00 2 days 11:15:00 Source A 0.1587 396
0.5000 400
sensor: <Sensor: weight>, event_resolution: 0:15:00

## Fixed viewpoint

Select the latest forecasts from a fixed viewpoint (beliefs formed at least before 2 AM January 1st 2000:

>>> df.fixed_viewpoint(datetime(2000, 1, 1, 2, tzinfo=utc)).head(8)
>>> df.fixed_viewpoint(datetime(2000, 1, 1, 2, tzinfo=pytz.utc)).head(8)
event_value
event_start belief_time source cumulative_probability
2000-01-03 09:00:00+00:00 2000-01-01 01:00:00+00:00 Source A 0.1587 99
Expand All @@ -54,12 +56,13 @@ Select the latest forecasts from a fixed viewpoint (beliefs formed at least befo
2000-01-03 10:00:00+00:00 2000-01-01 01:00:00+00:00 Source A 0.1587 198
0.5000 200
0.8413 202
sensor: <Sensor: weight>, event_resolution: 0:15:00

## Belief history

Select a history of beliefs about a single event:

>>> df.belief_history(datetime(2000, 1, 3, 11, tzinfo=utc))
>>> df.belief_history(datetime(2000, 1, 3, 11, tzinfo=pytz.utc))
event_value
belief_time source cumulative_probability
2000-01-01 00:00:00+00:00 Source A 0.1587 270
Expand All @@ -72,3 +75,4 @@ Select a history of beliefs about a single event:
0.8413 303
Source B 0.5000 0
1.0000 300
sensor: <Sensor: weight>, event_resolution: 0:15:00
6 changes: 4 additions & 2 deletions timely_beliefs/examples/beliefs_data_frames.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,17 @@ def sixteen_probabilistic_beliefs() -> BeliefsDataFrame:
"""Nice BeliefsDataFrame to show.
For a single sensor, it contains 4 events, for each of which 2 beliefs by 2 sources each, described by 2 or 3
probabilistic values, depending on the source.
Note that the event resolution of the sensor is 15 minutes.
Note that the event resolution of the sensor is 15 minutes, while the event start frequency is 1 hour.
"""

n_events = 4
n_beliefs = 2
n_sources = 2
true_value = 100

example_sensor = Sensor(event_resolution=timedelta(minutes=15), name="Sensor 1")
example_sensor = Sensor(
event_resolution=timedelta(minutes=15), name="weight", unit="kg"
)
example_source_a = BeliefSource(name="Source A")
example_source_b = BeliefSource(name="Source B")

Expand Down

0 comments on commit f5ed121

Please sign in to comment.