This repository contains two alternative implementations for forecasting photovoltaic power prediction based on weather data. It was developed for my bachelor thesis that compares those two approaches. Two data sources serve as weather and power output sample data.
I recommend pipenv to setup the project. With pipenv installed run pipenv install
from the project root and it will install all necessary dependencies. Then use pipenv shell
to enter the shell where all dependencies will be available or just pipenv run <command>
to execute a specific command.
If you don't use pipenv, dependencies are specified in the Pipfile and the versions i use can be found in the Pipfile.lock.
To use the PVWatts Service described later you need to obtain an API key. Then, you have to set it as environment variable (or put it in the code directly). If you use pipenv this can be done in a .env
file.
Examples on how to use the project can be found in the example files in the root directory. You can directly run those files after completing the necessary steps to obtain data described in the next section.
pipenv run python example_arima_pvwatts.py
pipenv run python example_arima_uq.py
pipenv run python example_svr_pvwatts.py
pipenv run python example_svr_uq.py
For plotting with matplotlib, pandas issues a future warning, so i put the following line after the imports:
from pandas.plotting import register_matplotlib_converters; register_matplotlib_converters()
There are two datasources for which an importer is available that parses the data in the right format. Of course it is possible to use other sources as well. Firstly there is a PVWatts wrapper and secondly one for data obtained from UQ Solar.
Prepared data from PVWatts looks similar to this:
time | tamb | wspd | power |
---|---|---|---|
2019-06-20 06:00:00 | 16.0 | 0.0 | 957.201 |
2019-06-20 07:00:00 | 18.0 | 1.5 | 1648.060 |
2019-06-20 08:00:00 | 21.0 | 3.1 | 2260.658 |
2019-06-20 09:00:00 | 23.0 | 4.1 | 2647.271 |
2019-06-20 10:00:00 | 24.0 | 2.6 | 2838.976 |
It can be obtained via the public PVWatts API. In order to get an API key you need to sign up at the NREL Developer Network. The API key then needs to be put in the environment variable PVWATTS_API_KEY
. Alternatively, you can insert it into pvwatts directly. When that is done you can use
from importers import pvwatts
data = pvwatts.load()
This calls the API and parses the result as a pandas DataFrame which can be passed to the forecasting modules. You can also pass the following optional parameters to load
:
- system_capacity
- module_type
- losses
- array_type
- tilt
- azimuth
- address
- lat
- lon
- radius
For information on these parameters please refer to the PVWatts V6 API Description. All parameters have default values so calling the method without parameters is possible as well. The return data is indexed using a DateTimeIndex. Since PVWatts does not specify dates but always returns data for a whole year, the year 2019 will be set fixed for each DataFrame returned from this module.
If you have a PVWatts json response saved in a json file it is also possible to parse that file directly using the following convenience method:
pvwatts.load_from_json('filepath.json')
Prepared data from the UQ Solar live feed looks similar to this:
time | airtemp | humidity | windspeed | winddirection | power |
---|---|---|---|---|---|
2015-06-20 07:00:00 | 9.27 | 75.24 | 1.72 | 191.47 | 25302.00 |
2015-06-20 08:00:00 | 11.28 | 68.72 | 2.14 | 201.45 | 78240.83 |
2015-06-20 09:00:00 | 13.38 | 59.02 | 2.59 | 214.77 | 128523.08 |
2015-06-20 10:00:00 | 14.49 | 54.19 | 2.92 | 201.67 | 162968.83 |
2015-06-20 11:00:00 | 15.65 | 51.89 | 2.34 | 203.23 | 172535.83 |
It can downloaded on the live feed website. Choose a PV Site
and a PV Array
from the sidebar on the right. Then click Download Data
and then Download Daily Logs
. From there you can specify a date range (I recommend a year), and then download a Power & Energy
file as well as a Weather
file. Make sure you download both with the same date ranges specified.
Now you can use
from importers import uq
data = uq.load('power_file.csv', 'weather_file.csv')
and substitute both parameters with the file paths for each respective file. The load
method combines both files again into a pandas DataFrame, ready to be passed to a forecasting module.
Now that a DataFrame with features and power data is present you can make forecasts. Both importers return a DataFrame which has different features, but both have a power
column which represents the power output.
Since the data from both importers has a DateTimeIndex it can be split up into training and testing data like this:
training_data = data['20190601':'20190607'] # first week of june 2019
testing_data = data['20190608':'20190614'] # second week of june 2019
Here the dates are converted from strings implicitly. For example '20190601'
depicts 2019/06/01 or June 01, 2019.
This algorithm uses the Scikit-Learn implementation of SVR which is based on libsvm. To use it the following class is available:
from predictors.svr_model import SVRModel
Making a prediction without scaling the data:
model = SVRModel(scaling=False)
model.fit(training_data, kernel='rbf', C=1e3, gamma=0.1, epsilon=0.1)
model.predict(testing_data) # returns the prediction data frame containing the testing features and the power prediction
model.prediction.power # use this to access the power prediction
Filtering out the power
column from the testing_data
frame is not necessary, the SVRModel
class does that automatically. Optionally it is possible to pass filter=['airtemp', 'humidity']
or similar to fit
to restrict the features used for forecasting. It is not necessary to filter the data manually this way.
The SVR variables kernel
, C
, gamma
and epsilon
are all optional, they all have default values.
To let the model scale the data before applying the regression you can set the scaling parameter to True
(this is the default value):
model = svr_model(base_data=data, scaling=True)
That way you do have to specify base_data
which can be a dataset for another year. This will not be used for regression, solely for fitting a feasible scaler
Object. For more information on that refer to the StandartScaler implementation of Scikit-Learn.
This algorithm makes use on the pmdarima package (formerly pyramid-arima). It can be used by importing the necessary class:
from predictors.arima_model import ARIMAModel
Now to make a prediction without using exogenous variables do the following:
model = ARIMAModel() # scaling is set to true automatically
model.fit(training_data, order=(2,1,4), seasonal_order(3,1,2, 24), use_exogenous=False)
model.predict(hours=48) # returns the prediction
model.prediction.power # use this to access the power predictionn
Providing testing features is not necessary because the model was fit without exogenous variables. In this case, the returned DataFrame will only have a power
column. The order
specifies the (p,d,q)
parameters of the model. The seasonal_order
specifies the (P,D,Q,s)
parameters. The scaling parameter can be set when creating the model, by default it is set to True
but it can be unset with:
model = ARIMAModel(scaling=false)
If you want to add exogenous variables set this parameter:
model.fit(training_data, order=(2,1,4), seasonal_order(3,1,2, 24), use_exogenous=True) # use_exogenous is True by default
model.predict(testing_data=testing_data)
This way you have to specify testing_data
when predicting. The power
column will be automatically removed when making the actual prediction. The prediction DataFrame will now have the testing_data columns where the power
column contains the predicted values.
If you want to let the algorithm automatically find appropriate p, q, P and Q
parameters use the following call:
model.fit_auto(training_data, p=(0,5), q=(0,5), P=(0,5), Q=(0,5), d=1, D=1)
The tuples define ranges in which the algorithm will search for optimal parameters. The d
and D
can be left out to automatically determine those parameters aswell. As with the fit
method, use_exogenous
will be set to True
by default, but can also be specified to be
False
.
As it is possible with the SVRModel
, you can also pass a filter array to fit
and fit_auto
to restrict which features should be included in the forecasting process.