Helping functions for Forecasting


helper methods for forecasting

approximate_index(dataset, findvalue)[source]

Return index value in dataset, with optimized find procedure. This assumes a dataset with continuous, increasing values. Typically, these are timestamps.

  • dataset (list) – a continuous list of values (f.e. timestamps)
  • findvalue (int) – the value, of which to find the index.
cached_data(name, data_function=None, max_age=0)[source]

store and retrieve data from a cache on the filesystem. The function will try to retrieve the cached data. If there is None or the data is too old, data_function will be called and the result is stored in the cache.

  • name (string) – name of cache file
  • data_function (function) – A function, which outputs the data to be stored. If the function is None and the cache is invalid, the funtion will return None.
  • max_age (int) – The maximum age (real time) in seconds, the cache is allowed to have before turning invalid.

data or None


input: int between 0,365 output: float between 0,1 interpolates a year day to 1=winter, 0=summer

perdelta(start, end, delta)[source]

generator function, which outputs dates. works like range(start, stop, step) for dates

  • start,end (datetime) – dates between which to iterate
  • delta (timedelta) – the stepwidth


class DataLoader[source]

This class reads data from CSV formatted in a specific way. The files are cached in memory to enable fast, re-reads

classmethod load_from_file(filepath, column_name, delim='t', date_name='Datum', sampling_interval=600)[source]

load a time series from a csv file. This assumes, that the csv is formatted in the following way:

Date header Row Header1 Row Header2 Row Header N
Timestamp0 Row1Value0 Row2Value0 RowNValue0
Timestamp1 ... ... ...

If the values in the file isn’t sampled evenly, because it contains skips, blackouts, etc.. the data will be sampled evenly by copying certain data (see evenly_sampled()).

  • column_name (string) – The name of the column (in the csv) to retrieve
  • delim (string) – The delimiter between values of a row. Default is Tab.
  • date_name (string) – The name of the Date header of the date row
  • sampling_interval (int) – The interval the data in the file is sampled.
classmethod evenly_sampled(data, date_name='Datum', sampling_interval=600)[source]

Will return a version of data, in which every value has a corresponding timestamp, which is roughly sampling_interval seconds away from the last value. This is a maximum interval, if the data contains closer values together than sampling_interval, no actions will be taken.

The data which is used to fill up gaps is tried to gather intelligently. It is specifically designed for electrical data and takes values from one week ago, if present, else one day or the last value if everything else fails.

  • data (dict) – dictionary with column names as keys and column data as values
  • date_name (string) – name of the date row
  • sampling_interval (int) – the number of seconds between each consecutive sample