openclimatefix / open-source-quartz-solar-forecast Goto Github PK

Open Source Solar Site Level Forecast

License: MIT License

Python 2.64% Jupyter Notebook 96.42% JavaScript 0.06% HTML 0.01% TypeScript 0.81% CSS 0.05% Dockerfile 0.01%

open-source-quartz-solar-forecast's Introduction

Quartz Solar Forecast

The aim of the project is to build an open source PV forecast that is free and easy to use. The forecast provides the expected generation in kw for 0 to 48 hours for a single PV site.

Open Climate Fix also provides a commercial PV forecast, please get in touch at [email protected]

We recently presented the Quartz Solar Forecast project at FOSDEM 2024 (Free and Open source Software Developers' European Meeting), providing an introduction to Open Climate Fix's motivation for this project and its impact on aiding organizations in resource optimization. To learn more about predictive model's functionality, visit here: Video Recording

The current model uses GFS or ICON NWPs to predict the solar generation at a site

from quartz_solar_forecast.forecast import run_forecast
from quartz_solar_forecast.pydantic_models import PVSite
from datetime import datetime

# make a pv site object
site = PVSite(latitude=51.75, longitude=-1.25, capacity_kwp=1.25)

# run model for today, using ICON NWP data
predictions_df = run_forecast(site=site, ts=datetime.today(), nwp_source="icon")

which should result in a time series similar to this one:

A colab notebook providing some examples can be found here.

Generating Forecasts

To generate solar forecasts and save them into a CSV file, follow these steps:

Navigate to the scripts directory

cd scripts

Run the forecast_csv.py script with desired inputs

python forecast_csv.py

Replace the --init_time_freq, --start_datetime, --end_datetime, and --site_name with your desired forecast initialization frequency (in hours), start datetime, end datetime, and the name of the forecast or site, respectively.

Output

The script will generate solar forecasts at the specified intervals between the start and end datetimes. The results will be combined into a CSV file named using the site name, start and end datetimes, and the frequency of forecasts. This file will be saved in the scripts/csv_forecasts directory.

Installation

The source code is currently hosted on GitHub at: https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast

Binary installers for the latest released version are available at the Python Package Index (PyPI)

pip install quartz-solar-forecast

You might need to install the following packages first

conda install -c conda-forge pyresample

This can solve the bug: ___kmpc_for_static_fini.

Model

Two models are currently available to make predictions.

Gradient Boosting Model (default)

The model uses GFS or ICON NWPs to predict the solar generation at a site. It is a gradient boosted tree model and uses 9 NWP variables. It is trained on 25,000 PV sites with over 5 years of PV history, which is available here. The training of this model is handled in pv-site-prediction TODO - we need to benchmark this forecast.

The 9 NWP variables, from Open-Meteo documentation, are mentioned above with their appropariate units.

Visibility (km), or vis: Distance at which objects can be clearly seen. Can affect the amount of sunlight reaching solar panels.
Wind Speed at 10 meters (km/h), or si10 : Wind speed measured at a height of 10 meters above ground level. Important for understanding weather conditions and potential impacts on solar panels.
Temperature at 2 meters (°C), or t : Air temperature measure at 2 meters above the ground. Can affect the efficiency of PV systems.
Precipiration (mm), or prate : Precipitation (rain, snow, sleet, etc.). Helps to predict cloud cover and potentiel reductions in solar irradiance.
Shortwave Radiation (W/m²), or dswrf: Solar radiation in the shortwave spectrum reaching the Earth's surface. Measure of the potential solar energy available for PV systems.
Direct Radiation (W/m²) or dlwrf: Longwave (infrared) radiation emitted by the Earth back into the atmosphere. confirm it is correct
Cloud Cover low (%), or lcc: Percentage of the sky covered by clouds at low altitudes. Impacts the amount of solar radiation reachign the ground, and similarly the PV system.
Cloud Cover mid (%), or mcc : Percentage of the sky covered by clouds at mid altitudes.
Cloud Cover high (%), or lcc : Percentage of the sky covered by clouds at high altitude We also use the following features

poa_global: The plane of array irradiance, which is the amount of solar radiation that strikes a solar panel.
poa_global_now_is_zero: A boolean variable that is true if the poa_global is zero at the current time. This is used to help the model learn that the PV generation is zero at night.
capacity (kw): The capacity of the PV system in kw.
The model also has a feature to check if these variables are NaNs or not. The model also uses the following variables, which are currently all set to nan
recent_power: The mean power over the last 30 minutes
h_mean: The mean of the recent pv data over the last 7 days
h_median: The median of the recent pv data over the last 7 days
h_max: The max of the recent pv data over the last 7 days

XGBoost

The second option is an XGBoost model and uses the following Numerical Weather Predictions (NWP) input features achieved from open-meteo variables. Different types of data is provided by open-meteo. To train this model hourly forecast data of the historical weather API was used. The time period is restricted by the availabilty of the target solar enegery data of the panels and covers the time between 2018 and 2021. Additional information about the time, location and specifics about the panel are used. The weather features used are listed below, with the description given by open-meteo.

Temperature at 2m (ºC): Air temperature at 2 meters above ground
Relative Humidity at 2m (%): Relative humidity at 2 meters above ground
Dewpoint at 2m (ºC): Dew point temperature at 2 meters above ground
Precipitation (rain + snow) (mm): Total precipitation (rain, showers, snow) sum of the preceding hour
Surface Pressure (hPa): Atmospheric air pressure reduced to mean sea level (msl) or pressure at surface. Typically pressure on mean sea level is used in meteorology. Surface pressure gets lower with increasing elevation.
Cloud Cover Total (%): Total cloud cover as an area fraction
Cloud Cover Low (%): Low level clouds and fog up to 3 km altitude
Cloud Cover Mid (%): Mid level clouds from 3 to 8 km altitude
Cloud Cover High (%): High level clouds from 8 km altitude
Wind Speed at 10m (km/h): Wind speed at 10, 80, 120 or 180 meters above ground. Wind speed on 10 meters is the standard level.
Wind Direction (10m): Wind direction at 10 meters above ground
Is day or Night: 1 if the current time step has daylight, 0 at night
Direct Solar Radiation (W/m2): Direct solar radiation as average of the preceding hour on the horizontal plane and the normal plane (perpendicular to the sun)
Diffusive Solar Radiation DHI (W/m2): Diffuse solar radiation as average of the preceding hour

To use this model specify model="xgb" in run_forecast(site=site, model="xgb", ts=datetime.today()).

Model Comparisons

The following plot shows example predictions of both models for the same time period. Additionally for the Gradient Boosting model (default) the results from the two different data sources are shown.

Predictions using the two different models and different data sources.

Known restrictions

The model is trained on UK MetOffice NWPs, but when running inference we use GFS data from Open-meteo. The differences between GFS and UK MetOffice could led to some odd behaviours.
Depending, whether the timestamp for the prediction lays more than 90 days in the past or not, different data sources for the NWP are used. If we predict within the last 90 days, we can use ICON or GFS from the open-meteo Weather Forecast API. Since ICON doesn't provide visibility, this parameter is queried from GFS in any case. If the date for the prediction is further back in time, a reanalysis model of historical data is used (open-meteo | Historical Weather API). The historical weather API doesn't't provide visibility at all, that's why it's set to a maximum of 24000 meter in this case. This can lead to some loss of precision.
The model was trained and tested only over the UK, applying it to other geographical regions should be done with caution.
When using the XGBoost model, only hourly predictions within the last 90 days are available for data consistency.

Evaluation

Gradient Boosting Model (default)

To evaluate the model we use the UK PV dataset and the ICON NWP dataset. All the data is publicly available and the evaluation script can be run with the following command

python scripts/run_evaluation.py

The test dataset we used is defined in quartz_solar_forecast/dataset/testset.csv. This contains 50 PV sites, which 50 unique timestamps. The data is from 2021.

The results of the evaluation are as follows The MAE is 0.1906 kw across all horizons.

Horizons	MAE [kw]	MAE [%]
0	0.202 +- 0.03	6.2
1	0.211 +- 0.03	6.4
2	0.216 +- 0.03	6.5
3 - 4	0.211 +- 0.02	6.3
5 - 8	0.191 +- 0.01	6
9 - 16	0.161 +- 0.01	5
17 - 24	0.173 +- 0.01	5.3
24 - 48	0.201 +- 0.01	6.1

If we exclude nighttime, then the average MAE [%] from 0 to 36 forecast hours is 13.0%.

Notes:

The MAE in % is the MAE divided by the capacity of the PV site. We acknowledge there are a number of different ways to do this.
It is slightly surprising that the 0-hour forecast horizon and the 24-48 hour horizon have a similar MAE. This may be because the model is trained expecting live PV data, but currently in this project we provide no live PV data.

XGBoost

The model was trained and evaluated on 1147 solar panels and tested on 37 independent locations. An intensive hyperparameter tuning was performed. The model provides a feature importance list. Different metrics were calculated and analyzed. Finally the model was evaluated using the Mean Absolute Error (MAE). The MAE over the entire test data is $0.12$ kW, when the night times are excluded the MAE is $0.21$ kW. A plot with the MAE for each panel in the test set is shown in the figure below.

Mean absolute error for the panels in the test set.

Notes:

The evaluation per horizon is not available for this model, as it is not provided by the open-meteo data.

Abbreviations

NWP: Numerical Weather Predictions
GFS: Global Forecast System
PV: Photovoltaic
MAE: Mean Absolute Error
ICON: ICOsahedral Nonhydrostatic
KW: Kilowatt

FOSDEM

FOSDEM is a free event for software developers to meet, share ideas and collaborate. Every year, thousands of developers of free and open source software from all over the world gather at the event in Brussels. OCF presented Quartz Solar Forecast project at FOSDEM 2024. The link to the original FOSDEM video is availble at Quartz Solar OS: Building an open source AI solar forecast for everyone. It is also available on YouTube

Running the dashboard locally

Start the API first (port 8000): cd api python main.py

Start the frontend (port 5137): cd dashboards/dashboard_1 npm install npm run dev

There is also a steamlit dashboard in dashboards/dashboard_2 that can be used.

Contribution

We welcome other models.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Peter Dudfield} 💻	_Megawattz 🤔 📢	_EdFage 📖 💻	_{Chloe Pilon Vaillancourt} 📖	_{rachel tipton} 📢	_armenbod 🖋 💻	_{Shreyas Udaya} 📖
_{Aryan Bhosale} 📖 💻	_Francesco 💻	_{Rosheen Naeem} 📖	_{Bikram Baruah} 💻	_{Jakob Gebler} 🐛	_{Om Bhojane} 💻	_{Chris Adams} 🤔
_{Mudra Patel} 📖	_{Diego Marvid} 📖	_{Frauke Albrecht} 💻	_{Pablo Alfaro} 👀	_KelRem 💬	_{Lia Chen} 💻

This project follows the all-contributors specification. Contributions of any kind welcome!

open-source-quartz-solar-forecast's People

Contributors

Stargazers

Watchers

Forkers

chloepilonv edfage armenbod bikramb98 petermnhull shreyasudaya raiiasingh19 divinelight2002 0xframa mayankpalan2004 roshnaeem sangu-firedev riyasachdeva04 ombhojane liamjdavis pranjalraman03 aayushrawat praj-tarun felipewhitaker diegomarvid pranaykumarmenthula hapyr zy82621239 weiyang22 soh-123 bhavya-chanana yoofin oliver-zen antimony5292 anna-sai-nikhil anirudhprabhakaran3 mudrap17 jadevexo jasonfenggit adagio256 optaienergy braunrudolf sicunchen tudorb1199 mduffin95 adityajain93 aryanbhosale clemo97 vishesh-mistry

open-source-quartz-solar-forecast's Issues

Add FOSDEM Open Quartz talk to ReadMe

We launch Open Quartz at the conference FOSDEM a couple months ago. This talk was recorded and is a good introduction to OCF as well as how the model currently works. It would be great to add a link and sentence or two about the talk to the Readme.

Heres the links:

Heres a link to the video on youtube: https://www.youtube.com/watch?v=NAZ2VeiN1N8

Heres a link to the original FOSDEM video: https://fosdem.org/2024/schedule/event/fosdem-2024-2960-quartz-solar-os-building-an-open-source-ai-solar-forecast-for-everyone/

I would just use the link to the youtube video for now

setup.py doesn't import model-0.3.0.pkl

Describe the bug

When I run python setup.py install all the python packages install, but the "models" subdirectory and model-0.3.0.pkl file does not.

To Reproduce

Steps to reproduce the behavior:

Set up a new virtual environment
Run python setup.py install

Expected behavior

In env/lib/python3.11/site-packages, all the .py and the .pkl files should be installed.

Fix

I think I have fixed this issue by adding

`package_data={"": ["models/model-0.3.0.pkl"]}`

to setup() in setup.py, as per https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#package-data

Want me to submit a PR for this?

PV forecast in browser - pyodide

To investigate if pyodide (https://pyodide.org/en/stable/) can be used to run the a PV forecast in a browser application on WebAssembly.

remove time -> default to now

Detailed Description

It would be great if the main function run_forecast could optionally take a ts. If none was provided then it should default to now (rounded down to 15 mins)

Context

nice to have things nice and simple

Possible Implementation

in the function let ts=None,
sudo code
if None: ts = pd.Timestamp.now().round('15T')

Challenge: new model

Can you make a new model and beat the current evaluations metrics?

You need to build a forecasts to forecast PV. The PV dataset is all here, and we also want to model to run like the current model i.e pulling NWP data from open-meteo.

We need a model that can forecast 48 hours ahead, in 15 minute intervals. We want it to run live without PV live data, but an good optional extra would be to include PV data.

This is fairly open ended on in order to not restrict anyone.

try ICON data

Would be great to try the model with ICON data,

Add option to switch to ICON data when loading data in inference.

This should be done here. You might have to play a bit with https://open-meteo.com/en/docs/dwd-api first, in order to get the right API call.

Also in need to do evaluation open on the model, use ICON data, https://huggingface.co/datasets/openclimatefix/dwd-icon-eu. Need to do #2 first

Add pin on requirements

Would be good to add a pin on requirements. perhaps matching minor versions atleast.

for example pandas==2.1.3

Things to do

Short term

Medium term things

update `H` to `h`

Detailed Description

Small update for pandas

There might be a few more ...

Context

good to keep warnings down

Possible Implementation

change H to h

Download a specified time range of forecasts

The idea here is to create a script that can be run to generate forecasts over a specified time frame and combine the results into a csv (or perhaps netcdf) format.

Create a script that takes an input of:

forecast init time frequency (i.e how frequently should the forecast be ran).
Start datetime
End datetime
maybe name of the forecast or site.

The run_forecast script could be looped over for each init_time with the results combined into a csv.

The output should be saved in a resonable location and use the different inputs as part of the file name.

The script could belong in the "script" folder.

Would also be useful for simple test to be created for the script as well

Tilt and Orientation params don't seem to affect output

Describe the bug

When defining 4 separate PV sites, each of 1kWp, at the same location but with each site at 90 degrees orientation from the next, the prediction is the same for all four sites. I would expect to see a shift in when PV generation starts and ramps up, especially for the East/West sites, but they appear to be the same.

To Reproduce

I've put together a Google Colab worksheet here: https://colab.research.google.com/drive/1HXaASf-cRihcwtLbw5gBx5QU7fhvix9L?usp=sharing

You need to run the pip install cell first, then when prompted restart the session, before executing the prediction and plotting cells.

Expected behavior

I would expect to see the east site start generating and ramp up before the west site, and for the south site to have a shorter generation time, but higher peak (based on my understanding of PV array azimuth differences).

Project GSoC: Solar Panel Inverter data

Working Progress ....

Project Description: Connect our Open Souce Quarts Solar Forecast to use live data from an PV system using the Enphase Inverter. This would allow users with Enphase inverters to gain 20% more accurate PV forecasts and show the accuracy of their forecast versus the actual generation figures.

Expected Outcome: Open Quartz Solar is able to use live data from the Enphase inverter, added as a module to the library.

Context

connected to #36
There might be some research to see what code is currently available, and what could be used

Possible Implementation

Build code that can easily get the last ~1 hour of enphase data. I suspect users will have to pass in their enphase api details for this.

Tidy up readme

Add what NWP variables are used. Would be good to give the units of these too

Recurring Runtime Error in run_eval Function: Stuck on Loading PV Data

Describe the bug

I am trying the run_eval function in evaluation.py. I am running the script, but it is getting stuck at some point and just loading Pv data again and again. I think it is getting stuck at the get_nwp() function as it prints “Made all NWP tasks, now getting the data” and then states, “Loading PV data” again and throws a runtime error. It is repeating this process again and again “Loading PV data” and throwing runtime error again. I am guessing the issue is in this line of get_nwp() function maybe. I have attached the screenshots.

Note: The data files metadata.csv and pv.netcdf are already downloaded on my laptop.

To Reproduce

Steps to reproduce the behavior:

Run the run_eval function in https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/evaluation.py

Screenshots

More training data from Open-Meteo

Hi, author of Open-Meteo here.

I noticed you are using UKMO data for training, due to the 3 months limitation at Open-Meteo. I am working on an archive for high resolution models. Data is available from GFS from April 2021 and DWD ICON from November 2022.

The API endpoint is a bit slow at the moment, but performance and data availability will be improved in the next weeks.

Let me know if this helps or if you need any other data!

EDIT: Most data is also available as open-data through an AWS S3 sponsorship.

Update readme.md with more features we use

We should update the readme.md to all the features we use

This are all made in pv-site-predictions and are

{'prate_isnan', 'capacity', 't', 'dlwrf', 'vis', 'h_max_nan', 'h_max', 'recent_power', 'recent_power_nan', 'dswrf_isnan', 'lcc', 'dlwrf_isnan', 'h_mean_nan', 'prate', 'h_median_nan', 'mcc_isnan', 'poa_global', 'hcc', 'h_mean', 'hcc_isnan', 'poa_global_now_is_zero', 'vis_isnan', 'mcc', 'dswrf', 'si10_isnan', 'h_median', 'si10', 'lcc_isnan', 't_isnan'}

x_isnan is a feature if that x-feature is nan or not

recent_power = The mean power over the last 30 minutes
capacity = The capacity of the site
h_mean = The mean of the recent pv data over the last 7days
h_median = The median of the recent pv data over the last 7 days
h_max = The max of the recent pv data over the last 7 days
poa_global = The theothetical irradience at that time and place. POA = Plane of Array
poa_global_now_is_zero = feature if poa_global is zero or not

Evaluate model on 2022 data

Would be great to make this a general method for evaluating the data.

Will might to pull GFS data.

We perhaps need to define a test set so that we can compare with others

@jacobbieker @zakwatts what was the test set we used for pv-site-predictions? Perhaps we could use the same one here

Adjust for capacity

Adjust all sites to 1 kwp and the times predictions by these results

Need to adjust site here to 1 kwp and then times prediction here

There is some argument to letting the model use this information and 10 MW sites will behaviour differently to 1KW as they will be more spread out geograhpically. However, I think until we trained with more examples like that, its worth having the simpliest of all scaling to 1KW sites.

Pass a dictionary of sites to generate forecasts for and save together

The idea here is to create a script that can be run to generate forecasts for multiple sites and combine these current forecasts into a csv for a single initialisation time. This could be done by creating a dictionary or list of sites or even reading in sites from a csv.

Create a script that takes an input of:

pv_id (a way to identify each site)
latitude
longitude
capacity

(for now we can leave out the tilt and orientation, but would be worth making a note where you might want to include tilt and orientation in the code in the future in a comment)

The output should be saved in a resonable location and format.

The script could belong in the "script" folder.

Would also be useful for simple test to be created for the script as well

Evaluation script throwing path error

Describe the bug

Running the evaluation script throws an error, the path to the test dataset in run_eval() needs to be fixed.

To Reproduce

Steps to reproduce the behaviour:
Run the evaluation script

Expected behaviour

The run_eval function should load the dataset.

Add Visualization

It would be great to model the visualization on this ocean currents site and visualise the global (or UK) PV changes over the next 48 hours for use as well as to see trends.

Detailed Description

This would include adding a geographic map and plotting a heat map after predicting the PV values at each point on the map grid, superimposed on the geographic map.

Context

Users may want to see PV trends over the next period, or guidance on choosing an address for their site.

Possible Implementation

This seems to be easily done using the matplotlib library.

check nwp variables are in the right order

add contribution guide

Would be good to add this on the readme

Here's OCF code style https://github.com/openclimatefix/.github/blob/main/coding_style.md

I think we can start pretty generic, and we can always update

Add plots of 10 first predictions in evalution

Add plots to evaluation script and look at the results between the truth and forecasts

Might be worth seeing the difference that #112 makes

Add orientation and tilt

Add orientation and tilt in here

Need to test out if the model changes its results depending on these values. We might need to change something in psp

Get image wokring in pypi

Currently on pypi the peedictions.png doesnt work
https://pypi.org/project/quartz-solar-forecast/

Eval: missing real generation_power values

Describe the bug

In evaluation, some of the real/expected values of generation_power are missing.

To Reproduce

Steps to reproduce the behavior:

Run python scripts/run_evaluation.py with the following testset.csv (a small test to illustrate the bug):

pv_id,timestamp
9531,2021-05-08 10:00:00

Some values missing in results.csv in the generation_power columns
Example results.csv:

,forecast_power,horizon_hour,pv_id,timestamp,generation_power
0,0.5382338261787198,0,9531,2021-05-08 10:00:00,
1,0.6805504837540712,1,9531,2021-05-08 11:00:00,
2,0.6950511506600507,2,9531,2021-05-08 12:00:00,
3,0.7507192765284325,3,9531,2021-05-08 13:00:00,
4,0.6222327619232007,4,9531,2021-05-08 14:00:00,
5,0.46010747864610435,5,9531,2021-05-08 15:00:00,
6,0.2792985706278065,6,9531,2021-05-08 16:00:00,
7,0.11883538094408863,7,9531,2021-05-08 17:00:00,0.19273080444335938
8,0.03377143967258781,8,9531,2021-05-08 18:00:00,0.05239992141723633
9,0.004003063439732276,9,9531,2021-05-08 19:00:00,0.0
10,0.0,10,9531,2021-05-08 20:00:00,0.0
11,0.0,11,9531,2021-05-08 21:00:00,0.0
12,0.0,12,9531,2021-05-08 22:00:00,0.0
13,0.0,13,9531,2021-05-08 23:00:00,0.0
14,0.0,14,9531,2021-05-09 00:00:00,0.0
15,0.0,15,9531,2021-05-09 01:00:00,0.0
16,0.0,16,9531,2021-05-09 02:00:00,0.0
17,0.0,17,9531,2021-05-09 03:00:00,0.0
18,0.0006960749166189652,18,9531,2021-05-09 04:00:00,0.0
19,0.021830932182701164,19,9531,2021-05-09 05:00:00,0.002466707944869995
20,0.04920016630787139,20,9531,2021-05-09 06:00:00,0.12896760559082032
21,0.16425460389406232,21,9531,2021-05-09 07:00:00,0.22877279663085937
22,0.2536578989915163,22,9531,2021-05-09 08:00:00,0.8414171752929688
23,0.3202140667660062,23,9531,2021-05-09 09:00:00,0.6911544189453125
24,0.6471341332970747,24,9531,2021-05-09 10:00:00,0.8355504150390625
25,0.7728203006501675,25,9531,2021-05-09 11:00:00,1.15409765625
26,0.6856276972650501,26,9531,2021-05-09 12:00:00,0.6737999877929688
27,0.7735971877911895,27,9531,2021-05-09 13:00:00,1.11731640625
28,0.6681219518935074,28,9531,2021-05-09 14:00:00,0.20179200744628906
29,0.49810158614186933,29,9531,2021-05-09 15:00:00,0.45828359985351563
30,0.3536980181332593,30,9531,2021-05-09 16:00:00,0.35039999389648435
31,0.19379396872601617,31,9531,2021-05-09 17:00:00,0.2593247985839844
32,0.05294271353381089,32,9531,2021-05-09 18:00:00,0.17835600280761718
33,0.00577927292344424,33,9531,2021-05-09 19:00:00,0.07551947784423828
34,0.0,34,9531,2021-05-09 20:00:00,3.235164058423834e-09
35,0.0,35,9531,2021-05-09 21:00:00,0.0
36,0.0,36,9531,2021-05-09 22:00:00,0.0
37,0.0,37,9531,2021-05-09 23:00:00,0.0
38,0.0,38,9531,2021-05-10 00:00:00,0.0
39,0.0,39,9531,2021-05-10 01:00:00,0.0
40,0.0,40,9531,2021-05-10 02:00:00,0.0
41,0.0,41,9531,2021-05-10 03:00:00,0.0
42,0.0016835594981394644,42,9531,2021-05-10 04:00:00,0.0
43,0.04807132423975142,43,9531,2021-05-10 05:00:00,0.01917263984680176
44,0.2019059924841576,44,9531,2021-05-10 06:00:00,0.20261639404296874
45,0.4591377241020738,45,9531,2021-05-10 07:00:00,0.33280679321289064
46,0.7547477658079034,46,9531,2021-05-10 08:00:00,0.34174200439453123
47,1.068172900817906,47,9531,2021-05-10 09:00:00,0.9841751708984375

Expected behavior

No missing values (or maybe some fallbacks to handle missing values).

Test failing

The CI tests are failing right now.

Would be great to get them working again.

They were wokring on main last week, so I'm not sure what has changed here.

Any help would be appreciated

FOSDEM 2024: feedback

Would be great if anyone has any feedback from our FOSDEM 2024 talk and demo.
This can be feature requests, bug reports, or just a discussion.

Talk - https://fosdem.org/2024/schedule/event/fosdem-2024-2960-quartz-solar-os-building-an-open-source-ai-solar-forecast-for-everyone/

Add training/inference using OpenMeteo Open Dataset

OpenMeteo provides a newish public archive of multiple different NWP providers globally, see https://github.com/open-meteo/open-data for more on it. It is designed for more site-level access to the data, so seems like it would fit very well with Open Quartz Solar forecasting. The data is generally available from December 2023 to now.

Detailed Description

This would include adding a model training or inference script to pull from the OpenMeteo dataset.

Context

It could make it easier for more historical forecasts to be ran, as the archive gets larger. And easier to see how different providers might make a difference in the forecasting models here.

bug: ___kmpc_for_static_fini

ImportError Traceback (most recent call last)
Cell In[1], line 8
5 site = PVSite(latitude=51.75, longitude=-1.25, capacity_kwp=1.25)
7 # run model, uses ICON NWP data by default
----> 8 predictions_df = run_forecast(site=site, ts='2023-11-01')

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/quartz_solar_forecast/forecast.py:31, in run_forecast(site, ts, nwp_source)
28 pv_xr = make_pv_data(site=site, ts=ts)
30 # load and run models
---> 31 pred_df = forecast_v1(nwp_source, nwp_xr, pv_xr, ts)
33 return pred_df

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/quartz_solar_forecast/forecasts/v1.py:20, in forecast_v1(nwp_source, nwp_xr, pv_xr, ts, model)
13 """
14 Run the forecast
15
16 This runs the pv-site-prediction model from the psp library.
17 """
19 if model is None:
---> 20 model = load_model(f"{dir_path}/../models/model-0.3.0.pkl")
22 # format pv and nwp data
23 pv_data_source = NetcdfPvDataSource(
24 pv_xr,
25 id_dim_name="pv_id",
(...)
28 ignore_pv_ids=[],
29 )

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/psp/serialization.py:25, in load_model(filepath)
22 def load_model(filepath: pathlib.Path | str) -> PvSiteModel:
23 # Use fsspec to support loading models from the cloud, using paths like "s3://..".
24 with fsspec.open(str(filepath), "rb") as f:
---> 25 (cls, attrs) = pickle.load(f)
27 model = cls.new(cls)
28 model.set_state(attrs)

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/psp/models/recent_history.py:13
11 from psp.data_sources.nwp import NwpDataSource
12 from psp.data_sources.pv import PvDataSource
---> 13 from psp.data_sources.satellite import SatelliteDataSource
14 from psp.models.base import PvSiteModel, PvSiteModelConfig
15 from psp.models.regressors.base import Regressor

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/psp/data_sources/satellite.py:1
----> 1 import pyresample
2 import xarray as xr
4 from psp.data_sources.nwp import NwpDataSource

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pyresample/init.py:27
25 from pyresample import geometry # noqa
26 from pyresample import grid # noqa
---> 27 from pyresample import image # noqa
28 from pyresample import kd_tree # noqa
29 from pyresample import plot # noqa

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pyresample/image.py:26
22 import warnings
24 import numpy as np
---> 26 from pyresample import geometry, grid, kd_tree
29 class ImageContainer(object):
30 """Holds image with geometry definition. Allows indexing with linesample arrays.
31
32 Parameters
(...)
54 Number of processor cores to be used for geometry operations
55 """

File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pyresample/kd_tree.py:31
28 from logging import getLogger
30 import numpy as np
---> 31 from pykdtree.kdtree import KDTree
33 from pyresample import CHUNK_SIZE, _spatial_mp, data_reduce, geometry
35 from .future.resamplers._transform_utils import lonlat2xyz

ImportError: dlopen(/Users/xxx/.pyenv/versions/3.11.2/lib/python3.11/site-packages/pykdtree/kdtree.cpython-311-darwin.so, 0x0002): symbol not found in flat namespace '___kmpc_for_static_fini'

Next Steps with Enphase Inverter

Detailed Description

Send the AUTHORIZATION_URL to the Home Owner / System Owner to grant permission to us to use their system data, they will send us an ENPHASE_SYSTEM_ID ---> I currently don't have access to this since I don't know anyone who owns an enphase inverter, but if anyone here could grant my AUTHORIZATION_URL permission to use their enphase data that would be really great!

Context

This is with reference to #106 that builds on #66 which fixes #36

Possible Implementation

#106

Define Test set

It would be super good to define a test set in order to do evaluations on.

The idea was to use ~50 sites from the passiv data - https://huggingface.co/datasets/openclimatefix/uk_pv from 2021

Idealy we should make a csv with
timestamp, id for all these sites.

We could do the whole of 2021, but it might be good to do a random subset, say ~5,000 times for all of those sites

Timestamps more than 6 months ago

Detailed Description

Currently this only works for timestamps less than 3 months.
It wold be good if it works for more than that

Context

we get most NWP variables from ICON, and just visibilty from GFS. GFS is limited to 3 months. ICON is limits to 6 months

Possible Implementation

load ICON data from HF https://huggingface.co/datasets/openclimatefix/dwd-icon-eu/tree/main/data (last 4 years) or https://huggingface.co/datasets/openclimatefix/dwd-icon-global/tree/main/data/2023 for 2023.

pin `pv-site-prediction` to 0.1.16

pin pv-site-prediction to 0.1.16

and pin other ones too

Quartz-Solar-Forecast_demo

Detailed Description

Given the absence of specific data for the model, I proceeded to predict photovoltaic power generation by leveraging preprocessing and feature engineering techniques using data retrieved from https://dkasolarcentre.com.au/download?location=alice-springs.
This demonstration encompasses four distinct models: LSTM, LSTM+CNN, LSTM+CNN+Attention, and RNN.

Context

In this demo, extracting strongly correlated features from cleaned data after data preprocessing will yield a better prediction performance. In community projects, I believe that data preprocessing and feature engineering are crucial aspects. How to clean and merge features from different datasets through data preprocessing operations in PSP will be a key point for the "Add other model architectures to Open Quartz Solar" project.

Possible Implementation

https://github.com/weiyang22/Open-Source-Quartz-Solar-Forecast/blob/main/Quartz-Solar-Forecast_demo.ipynb

Global implementations

Hi , first of all im not someone involved with development of machine learning or AI and that sort of things , Im an IoT engineer who have interest in integrating my home solar with this Quartz Solar Forecast , my question is can this be used in Malaysia or any toher regions outside of UK ? Or do i need to train the model according to my own datasets ? I would like to implement this Quartz Solar Forecast and get a prediction for my home solar on how much I can generate energy for the next 24 / 48 hours . Thank you :)

Update chart in README

The chart in the README currently shows what would be the results if the timestamp was set to two days before.

It would be great to update this chart in a similar format to reflect the actual results.

Fix tests

The CI tests are currently failing

I think we need to add init,py to the inverters folder

This happen with #66

publish to pypi

Publish this libary to pypi

Benchmark

Detailed Description

It would be great to bench mark the model

Context

Always good to benchmark

Possible Implementation

a model could use the mean pv value, obviously this model will be bad, but it gives a bit of context to the numbers on the evaluation

Evaluate Model Using XAI and Increase Interpretability

Detailed Description

The current solar forecasting model is a gradient boosted tree model, which can achieve high predictive accuracy but often lacks interpretability. It is proposed to evaluate the model using Explainable AI (XAI) techniques and increase its interpretability. This will involve:

Researching and evaluating different XAI techniques suitable for gradient boosted tree models, such as feature importance analysis, shapley values, or local interpretable model-agnostic explanations (LIME) or Microsoft Explainable Boosting Machine.
Implementing the selected XAI techniques and integrating them into the existing model evaluation and analysis pipeline.
Analyzing and visualizing the model's behavior, feature importances, and decision-making process using the XAI techniques.

Context

Understanding the model's decision-making process and the relative importance of input features is crucial for trust, transparency, and accountability in this open-source project. Increasing the model's interpretability using XAI techniques can:

• Facilitate the integration of domain knowledge, potentially improving model performance and interpretability.
• Enhance transparency and trust among users and stakeholders.
• Guide model refinement and improvement efforts based on insights gained from XAI analysis.
• Assess the model's robustness and fairness across different geographic regions or weather conditions, identifying potential biases or inconsistencies.

Possible Implementation

Evaluate and select XAI techniques like SHAP or LIME for interpretability analysis of the gradient boosted tree model.
Develop a separate module or script to integrate the chosen XAI techniques with the existing model evaluation pipeline.
Visualize and analyze feature importances, decision paths, and local explanations using the XAI techniques.

Add option to add live PV

Detailed Description

We could add an option that adds live pv data from near the time of inference.

Context

This has be done in pv-site-prediction

Possible Implementation

need to add to the pv data going into the model in here

Add python3.12

Detailed Description

Would be good if this ran for python3.12

Possible Implementation

run examples locally on python 3.12
run tests locally
Add to python 3.12 to CI

Unable to access ocf_datapipes, encountering gcp permission error

Describe the bug

I am trying to access ocf_datapipes in the project using pv.netcdf in the config file.

code

import ocf_datapipes  # noqa
from ocf_datapipes.training.example.simple_pv import simple_pv_datapipe

import os
import certifi
import ssl

os.environ['SSL_CERT_FILE'] = certifi.where()
ssl._create_default_https_context = ssl._create_unverified_context

config_file = 'pv_config.yaml'
data_pipe = simple_pv_datapipe(configuration_filename=config_file)

try:
    for batch in data_pipe:
        print(batch)  # For debugging: print the shape of the batch
except Exception as e:
    print(f"Encountered an error: {e}")

Screenshot

testset - data analyst

Have a look at the test set.

Look at the distribution of timestamps
Look at the distribution of PV metadata
check pv data is not nan at those timestamps

The testset is defined here https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/dataset/testset.csv

Adding tests for enphase inverter API

Add Tests for Inverter API (for #129 )

Detailed Description

A test on the make_pv_data function with some fake data returned when calling the get_enphase_data function

Context

Adding tests for the enphase inverter API is crucial to ensure the correctness and reliability of the code. It will help catch any regressions or issues early in the development process and provide confidence in the functionality of the API.

readme updates

add ICON to Abbreviations
Update the Open Meteo Icon data is available for the last 6 months.
after #21 add installation instruction. see #32
#34
add how to install from pypi

Mean irradience data

We've noticed that some of the Icon DWD Huggingface Irradience data is the mean, not the hourly average

Detailed Description

import xarray as xr
import ocf_blosc2
file = 'zip:///::hf://datasets/openclimatefix/dwd-icon-eu/data/2021/1/1/20210101_00.zarr.zip'
data = xr.open_zarr( f"{file}",chunks="auto")
dd = data['aswdifd_s']
dd.mean(dim=['latitude','longitude']).plot()

Context

Our model is trained on hourly average data, so this may casue the evaluation to underperform

Possible Implementation

Have transformer back to hourly data. I think you take the differences, and then times by the number of datapoints.
Check live data doesnt have this problem
Check other variables dont have this problem
Check other data on hugginface, do we also see this? Looks similar

GSoC Project Solar Inverters: collecting the last 30 mins average power

The Open Source Quartz Solar Forecast requires the following variables:

recent_power: The mean power over the last 30 minutes
h_mean: The mean of the recent pv data over the last 7 days
h_median: The median of the recent pv data over the last 7 days
h_max: The max of the recent pv data over the last 7 days

For the time being, gathering the last 30 mins average power and feeding it to this model and testing it is our priority

Context

This will allow the user to get the most recent data from their enphase system

Possible Implementation

Hitting the /api/v4/systems/{system_id}/telemetry/production_micro endpoint of the v4 API of enphase and calibrating it by testing different start /end times with the correct granularity(5 or 15mins for this) and converting it to a particular data format (as needed by pv-site-prediction)

openclimatefix / open-source-quartz-solar-forecast Goto Github PK

open-source-quartz-solar-forecast's Introduction

Quartz Solar Forecast

Generating Forecasts

Installation

Model

Model Comparisons

Known restrictions

Evaluation

Abbreviations

FOSDEM

Running the dashboard locally

Contribution

Contributors ✨

open-source-quartz-solar-forecast's People

Contributors

Stargazers

Watchers

Forkers

open-source-quartz-solar-forecast's Issues

Describe the bug

To Reproduce

Expected behavior

Fix

Detailed Description

Context

Possible Implementation

Detailed Description

Context

Possible Implementation

Describe the bug

To Reproduce

Expected behavior

Context

Possible Implementation

Describe the bug

To Reproduce

Screenshots

Describe the bug

To Reproduce

Expected behaviour

Detailed Description

Context

Possible Implementation

Describe the bug

To Reproduce

Expected behavior

Detailed Description

Context

Detailed Description

Context

Possible Implementation

Detailed Description

Context

Possible Implementation

Detailed Description

Context

Possible Implementation

Detailed Description

Context

Possible Implementation

Detailed Description

Context

Possible Implementation

Detailed Description

Context

Possible Implementation

Detailed Description

Possible Implementation

Describe the bug

code

Screenshot

Detailed Description

Context

Detailed Description

Context

Possible Implementation

Context

Possible Implementation

Recommend Projects